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PREFACE 

Sampling Statistics and Applications is the second volume of 
Fundamentals of the Theory of Statistics; this second volume is 
intended for advanced students or research workers. The first 
volume is entitled Elementary Statistics and Applications and is 
designed for beginning courses in statistics. 

After reviewing the basic concepts and definitions in Sampling 
Statistics and Applications , the authors discuss the general theory 
of frequency curves and the theory of random sampling. Impor¬ 
tant sampling distributions are derived, and their applications to 
a variety of problems are illustrated. Exact methods applicable 
primarily to normal populations and approximate methods used 
in sampling from discrete populations and from continuous non¬ 
normal populations are considered. Theoretical discussion is 
illustrated throughout to show real-life applications. The 
character of assumptions involved in theory is explicitly treated; 
the problems that such assumptions present when the statistician 
is confronted with practical applications are illustrated. 

The arrangement of the material in the book is based partly on 
grounds of logic and partly on practical considerations. The 
theory of frequency curves is general in scope and is therefore 
presented first. The theory of sampling, being an elaboration 
of a special part of the theory of frequency curves, is discussed 
after the more general theory. Within the discussion of the 
theory of sampling, the more elementary aspects are examined 
before approaching complex problems. This leads to a separa¬ 
tion of some material that might logically be run together. It 
gives the instructor greater freedom, however, in the selection of 
material appropriate for the level of his class, while the arrange¬ 
ment is such that he can assign material in logical sequence if he 
so desires. 

For the most part theory and application have been discussed 
together in this volume. In certain cases, however, in which 
theoretical discussion is especially elaborate, its applications to 
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practical problems are treated separately. Numerical calcula¬ 
tions for frequency curves, for example, are discussed in a separate 
chapter following the chapters devoted to the theory of frequency 
curves. Again, the uses of the sampling distributions of the 
mean, standard deviation, etc., are discussed in a chapter 
separate from that in which the theoretical derivation of these 
distributions is presented. On the whole, the arrangement of 
material is designed to facilitate the use of the book in the school 
or research laboratory as well as in the classroom. 

Progress in sampling theory and its application has been rapid 
during the past 25 years. Much has been accomplished in 
deriving exact sampling distributions for important statistics 
and in clarifying the assumptions on which statistical analysis 
rests. Discussion has also been active regarding the concept of 
probability likely to be most fruitful for statistical research, and 
increasing attention has been paid to the logic of statistical 
inference. This development of theory has been accompanied 
by progress in method and application. The discovery of more 
exact sampling distributions, for example, has led to greater 
emphasis upon careful design of experiments and less emphasis 
upon size of sample. Quality of method has, in many instances, 
displaced reliance only on quantity of numbers. 

Sampling Statistics and Applications represents an attempt to 
coordinate the new theory and applications with the old, to 
place the whole upon a logically consistent basis, and to put the 
subdivisions in proper relation to each other. To this end a 
concept of probability that at present appears most fruitful for 
statistical research was adopted, and the theory of sampling is 
explained in those terms. Elaboration of special techniques 
regarding the design of experiment and analysis of variance is 
not included since the primary intent is to emphasize funda¬ 
mentals of the theory of statistics. Explanations both of old 
and of new methods start with the fundamentals; and, unless 
otherwise indicated, a particular exposition includes all the 
argument. If a highly mathematical step of an argument is 
omitted or abbreviated, this is so stated. Some of the mathe¬ 
matical portions are included in the text; other more advanced 
mathematical material is placed in appendixes to some of the 
chapters. These mathematical parts are presented in such a 
way as to be readily teachable. 
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The authors have drawn freely upon the many monographs 
and the periodical literature that have appeared during recent 
years. Care has been exercised to make acknowledgment m 
footnotes to the sources of new ideas that have been incorporated 
into the authors’ own development of the subject. To all these 
vigorous workers in the field, too numerous to be listed by name, 
the authors are greatly indebted. The authors especially 
acknowledge a debt of gratitude to John H. Smith of the Bureau 
of Labor Statistics, who contributed many stimulating criticisms 
and suggestions that inspired important improvements in this 
book. The authors are grateful to the International Finance 
Section of Princeton University for the financial assistance that 
was given Acheson J. Duncan some years ago to study statistics 
and mathematical economics with the late Henry Schultz of 
the University of Chicago and with Harold Hotelling of Columbia 
University. The authors are indebted also to those men for help 
and guidance in the study of statistics. 

Naturally it is not to be supposed that the whole or any part, 
of the manuscript carries the endorsement of the authors former 
teachers or those who have helped with criticisms of the manu¬ 
script. The authors assume full responsibility for any errors or 
theory or calculations that may be present in the volume. 

They are indebted to Prof. R. A. Fisher, also to Oliver & Boyd, 
Ltd of Edinburgh, for permission to reprint Tables III and IV 
from their book Statistical Methods for Research Workers. As 
indicated in notes making specific acknowledgment in the text 
and in the Appendix, the authors are also indebted to others for 
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INTRODUCTION 

CHAPTER I 

DEFINITIONS AND BASIC CONCEPTS 

Frequency Distributions. Time-honored observation has 
made commonplace the eighteenth-century discovery that static 
variability in almost any conceivable attribute of an object, 
event, or condition follows a surprisingly uniform pattern. 1 
The uniform pattern of static variability is revealed by arranging 
the variable in a frequency distribution or frequency series, which 
is simply a summary of an array of the variable arranged from 
smallest to largest. It is also called a “mono variate.” When 
graphed the figure is called a “histogram,“frequency polygon,” 
or “frequency curve,” depending upon the manner of graphing. 
Two types of frequency series are to be carefully distinguished, 
discrete frequency series and continuous frequency series. 

Discrete Frequency Series. A discrete frequency series is one 
in which the variation, by the nature of the variable, is in distinct 
steps. The variation in size of men’s shoes occurs by distinct 
steps, not by infinitesimal differences. Accordingly, a frequency 
series describing the number of men’s shoes of various sizes is a 
discrete frequency series. The same statement could be made 
with respect to almost any item of clothing manufactured in 
sizes for mass consumption. 

The graph of a discrete frequency series should be in the form 
of a histogram, not a frequency curve; for the latter would sug¬ 
gest continuous variability, which is contrary to fact in the case 
of a discrete series. 

Continuous Frequency Series. A continuous series is one 
representing a phenomenon that varies by infinitesimal amounts. 

1 Static variability refers to variability which is not correlated with time 
or in which time is “held constant.” Variability correlated with time, i.e., 
dynamic variability, may assume a great variety of patterns. See Smith, 
J. G., and A. J. Duncan, Elementary Statistics and Applications, Chaps. V 
and XIX. 
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For example, the frequency distribution of weights or heights of 
people of some specified age is continuous in character; for the 
differences among a large number of people in weight or in height 
are in fact by infinitesimal amounts. A continuous series may 
have the appearance in a frequency table of the same discreteness 
as a discrete series; but this is because the arbitrarily discrete 

F 



Fig. 1.—Maternal mortality in cities of 100,000 or more population in the United 
States in 1938. (Deaths per 1,000 live births.) 

character of the unit of measurement eclipses the actual con¬ 
tinuous character of the data. 

The graph of a continuous frequency series may appropriately 
be in the form of a frequency curve, but often when the number 
of cases observed is comparatively few a continuous frequency 
series is pictured by a histogram. 

Histograms and Frequency Curves . Figure 1 is an illustration 
of a histogram; it is a graph of the frequency distribution pre¬ 
sented in the first two columns of Table 1. In this figure the 
frequency of any class interval is represented by a rectangle 
erected on that interval as a base and with a height equal to the 
observed frequency. 
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For many types of analysis it is preferable to present the fre¬ 
quency distribution and the corresponding histogram in a form 
in which the area of a rectangle represents the proportional fre¬ 
quency of an interval. Figure 2 is an illustration of such a 
histogram; it is a graph of the third column of Table 1, with the 
same scale variation as that used in Fig. 1, namely, that shown 
in column (1) of Table 1. 

Table 1.—Maternal Mortality in Cities of 100,000 or More 
Population in the United States in 1938 


(1) 

,, (2) 

(3) 

Deaths per 1,000 
live births 

Number of 
cities 

Proportional 
number of cities 



F 

X 

F 

N 

1- 

2 

.022 

2- 

16 

.172 

3- 

18* 

.193 

4- 

20 

.215 

5- 

15 

.161 

’ 6- 

10 

.108 

7- 

4 

.043 

8- 

6 

.064 

9- 

0 

.000 

10- 

2 

.022 


2 = 93 

2 « 1.000 


In histograms of the type shown in Fig. 2, the total area of the 
histogram always equals 1, and each of the bars is a portion of 1. 

Now suppose that the data from which the histogram has been 
constructed were a sample from a very large set of cases, theo¬ 
retically an infinite set. For example, the data might be the 
heights of 100 adult males of the white race, instead of the 
mortality statistics above illustrated. The 100 heights, then, 
would be a sample of the heights of all adult men of that race, 
presumably millions of men. If the size of the sample were 
increased, the class interval could be reduced without causing 
irregularities in the form of the plotted histogram. In fact, if 
the number in the sample is made larger and larger and at the 
same time the size of the class interval is continuously reduced, 
the histogram will tend to become more and more regular and 
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the tops of the rectangles, which are getting narrower and 
narrower, will come closer and closer to forming a smooth con¬ 
tinuous curve (a frequency curve). 

F 



Fig. 2. Maternal mortality in cities of 100,000 or more population in the United 
States in 1938. (Deaths per 1,000 live births.) 



In such a manner, the frequency curve may be viewed as 
the limit that an area histogram of proportional frequencies 
approaches as the number of cases is increased and the size of 
the class interval is reduced indefinitely. The frequency curve 
depicts the distribution of a theoretically infinite set of data, 
with a theoretically infinitesimal class interval. Figure 3 illus¬ 
trates a frequency curve. 
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Being the limit approached by an area histogram of propor¬ 
tional frequencies, the frequency curve has a total area between 
the curve and the X-axis that is always equal to 1. Uurtnermoie, 
any section of area under the curve will give the relative fre¬ 
quency of the cases falling within the class interval marking off 
that section of area. 

Populations, Parameters, and Statistics. To say that the 
population of the United States is some hundred and thirty 
million people is a familiar use of the word “ population.” In 
Statistics the word is used in the same familiar sense, but it is also 
used in a more general sense. In statistics the term u popula¬ 
tion” (or sometimes “universe”) refers to the enumeration of 
persons or animals of any kind or even to the enumeration of 
inanimate things. In statistics, population refers to all the 
objects of a defined kind in existence in some specified universe; 
for example, all the adult males in the United States constitute 
a population, as do all the automobiles in the United States or 
all the three-year-old steers in the United States. 

The measurements of characteristics of a population are called 
“parameters.” The average height of all adult males in the 
United States is a parameter of that population. The average 
weight of all the three-year-old steers in the United States is a 
parameter of that population. Barely are the parameters of any 
of these populations actually measured. In practice, it is much 
easier and more practical to estimate the parameter by taking 
the average or measuring the corresponding characteristic of a 
sample from the population in question. This latter measure 
is called a “statistic.” Thus parameters are the characteristics 
of the population, and the corresponding sample statistics are the 
measures of the corresponding characteristics of samples. 

Averages. In addition to the familiar arithmetic mean, other 
averages are used as devices for summarizing or comparing. The 
median and mode are often used in statistical analysis; less fre¬ 
quently, but necessarily for certain purposes, 1 the geometric 
mean and the harmonic mean are used. 

By definition the arithmetic mean is the sum of the variables 
divided by the number of variables, i.e., 



i Cf. Smith and Duncan, op. cit., pp. 173-179. 


(1) 
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When the variable is arranged in a frequency distribution, this 
equation for the mean generally appears as 


to represent the fact that the several class-interval values of X 
have their respective frequencies, so that 

2FX = F 1 X 1 + F 2 X 2 + • • • + F n X n . 

It is important to note that, when the frequency distribution 
is expressed proportionally, as in the third column of Table 1, 
it becomes a distribution of probability. 1 The arithmetic mean 
of a distribution of probability, i.e., of a proportional frequency 

distribution, is equal to s ^ X. Expressed in the symbols of 
probability, this equation is 

. % = XP(X)X (2) 

Table 2 illustrates the calculation of the arithmetic mean of the 
frequency distribution shown in Table 1. 

Table 2.—Calculation of the Arithmetic-mean Maternal Mortality 
in Cities of 100,000 or More in 1938 


Deaths per 

1,000 live births 





X 

Mid-point of 

F 

FX 

P(X) 

jjX or P(X)X 


class interval 





1-. 

1.5 

2 

3.0 

.022 

.033 

2- 

2.5 

16 

40.0 

.172 

.430 

3- 

3.5 

18 

63.0 

.193 

.676 

4- 

4.5 

20 

90.0 

.215 

.968 

5- 

5.5 

15 

82.5 

.161 

.886 

6- 

6.5 

10 

65.0 

.108 

.702 

7- 

7.5 

4 

30.0 

.043 

.322 

8- 

8.5 

6 

51.0 

.064 

.544 

9- 

9.5 

0 


.000 


10- 

10.5 

2 

21.0 

.022 

.231 



93 

445.5 

1.000 

4.792 


1 See ibid., p. 254. 
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Accordingly, 


X = 2 P(X)X = 4.792 

The sum of the column of FX’s must be divided by N (= 93) 
in order to find the mean, whereas the sum of the column of 
P(X)X ’s is the mean. It will be noted that taking the mid-point 
of each class interval as the X, respectively, of the various class 
intervals is equivalent to assuming that the frequencies within 
each class interval are so distributed that their average is equal 
to the mid-point. This assumption causes negligible error in 
the calculation of the mean. 1 


25°/o 
of the 
cities 

■■I—■— . 

0 


Fig. 4.- -Median and quartiles 


25% 25% 25% 

of the of the of the 
cities cities cities 

33-« 5$ * 

Q t Mi 0 3 

of mortality rates in selected cities of the United 
States. 


The median is a position average; by definition, the median 
is that value than which there is an equal number of cases larger 
and smaller. When the cases are arranged in an array, the 
median is either the value of the middle one (when there is an 
odd number of cases) or some value between the two middle ones 
(when there is an even number of cases). Ordinarily in the latter 
instance the arithmetic mean of the two middle cases is taken 
as the median value. The symbol for the median is Mi. 

The first quartile, Qi, is the value below which one fourth of 
the cases fall and above which three fourths of the cases fall. 
The third quartile, Q 3 , is that value below which three fourths 
of the cases fall and above which one fourth of the cases fall. 
The median, obviously, is equivalent to the second quartile. 
If the cases are arranged in an array, it should be apparent from 
the definitions of the median and the quartiles that these three 
values divide the cases into four parts, with one fourth of the 
cases in each part. This is illustrated graphically in Fig. 4. 

1 When the standard deviation is so calculated, as it usually is, it is neces¬ 
sary to correct for error due to the assumption made; the correction is called 
'‘Sheppard’s correction.” Cf . pp. 10, 12. 
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For the frequency distribution shown in Tables 1 and 2, the 
median is found by interpolating the value of the middle (46J) 
case, to be 4.53; Qi is found by interpolating the JV/4 case to 
be 3.3; and Q s is found by interpolating the SN/4 case to be 
5 , 9 ' As shown in Fig. 4, the significance of these three values is 
that, m 1938, 25 per cent of the cities of 100,000 or more popula¬ 
tion had a maternal death rate of less than 3.3 per 1,000 live 
births; 25 per cent of the cities had maternal death rates between 
3.3. and 4.5 per 1,000 live births; 25 per cent had maternal death 
rates between 4.5 and 5.9 per 1,000 live births; and the remaining 
2q per cent of the cities had maternal death rates greater than 
5.9 per 1,000 live births. 

. ^ ^ were desired to obtain a more detailed description of the 
distribution of these cities with respect to maternal death rates, 
the distribution could be divided in 10 parts each containing 
10 per cent of the cities. The dividing points of these 10 parts 
would be called “deciles,” instead of quartiles; there would be 
9 deciles, and the fifth would be the same value as the median. 
Similarly, if with a larger number of cases it were desired to 
obtain a description of the distribution sufficiently refined to say 
within what limits lie each 1 per cent of the group of cities, the 
distribution could be divided in 100 parts each containing 1 per 
cent of the cities. The dividing points of these 100 parts would 
be called “percentiles”; and there would be 99 percentiles, of 
which the fiftieth would be the same value as the median. 

Another commonly used average, the mode, is described in 
terms of relative frequency of occurrence. It is the magnitude 
that occurs more frequently than any other. The mode is the 
most probable value. When the data are presented in a fre¬ 
quency distribution, it is necessary to interpolate for the mode. 
The procedure may be illustrated by finding the mode in the 
frequency distribution shown in Tables 1 and 2. It may be 
assumed that the mode lies somewhere between 4 and 5, for 
more cases (20) lie in that class interval than in any other of the 
class intervals. The mode is equal to the lower limit of the modal 
class interval plus the interpolated part of the class interval 
established by the relationship of the frequencies above and 
below that class interval. Thirty-six frequencies lie below the 
modal class interval, and 37 frequencies lie above the modal class 


interval. Hence the mode is equal to 4 + 


37 + 36 


(1) = 4.5. 
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The geometric mean is the nth root of the product of n variables 
X Thus, the geometric mean of 5, 8, and 25 is the cube root of 
5 X S X 25 = 10. The geometric mean may also be defined 
as the antilogarithm of the arithmetic mean of the logarithms 
of the variables X, i.e., 


log G.M. = 


SlogX 


N 


(3) 


The harmonic mean is the reciprocal of the average of the 
reciprocals of a variable magnitude Xi, X 2 , • • • ? X n , thus: 


H.M. = 


N 


Si 


(4). 


Moments. In statistics the term “moment” has been taken 
over from physics. In physics, moment is a measure of a force 
with respect to its tendency to produce rotation. The strength 
of the tendency depends on the amount of force and the distance 
from the origin of the point at which the force is applied. If the 
arithmetic mean is taken as the origin in a frequency distribution 
and the frequencies in each class interval (Fi, F 2 , • • * ? *n) are 
taken as the forces, at distances xi, X 2 , . . . , %n, the moments 
(more exactly moment coefficients since the physical moments 
are divided by N) are defined as follows: 


XFx 
Ail " N 
SFz 2 

_ SFz 3 

M3 - N 

_ S FXU 
Tn- N 



in which x — X — X. 

The Greek letter mu (ju) is used to describe the moments about 
the arithmetic mean. When the deviations are measured about 
an arbitrary origin, instead of about the arithmetic mean, 
the symbol used to represent the moments is the Greek letter 
nu {v). Generally speaking, it is more convenient first to 
calculate the moments about an arbitrary origin and by the use 
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of the following equations to obtain the moments about the mean: 

Mi = vi — vi = 0 i 

= V 2 — v \ 

M3 = Vz 3^2^1 “I” 2vf (®) 

M4 = — 3 ^ ; 

By the use of Sheppard’s correction, pand M 4 must be adjusted 
for errors involved in the use of mid-points of class intervals as 
the X’s in the process of calculation. 1 When it is desired to use a 
symbol for a population parameter in the present text, ju is 
printed in boldfaced type (y). For the most part, statisticians 
deal with statistics and speculate about parameters. Seldom 
is it possible to specify the value of a parameter. But it is often 
possible to make a maximum likelihood estimate of a population 
parameter; the symbol for this is the symbol for the corresponding 
statistic, with a breve above it. Thus the maximum likeli¬ 
hood estimates of the population moments are represented by 
Ml; M2; . . . , Mn* 

Types of Frequency Distribution. The moments are important 
because it is possible to calculate from them precise measures 
that will distinguish types of frequency distributions. The 
measures used by Karl Pearson to distinguish types of frequency 
distributions are the moments and functions of the moments. 2 
Certain functions of the moments derived from them are called 
“betas.” The first two are defined as follows: 




If a frequency distribution is normal, = 0 and /3 2 = 3. The 
degree to which the betas depart from these values is therefore a 
precise measure of the degree to which a frequency curve is not 
of the normal type. 


1 W‘ PP* 12, 132 134, It should be noted that Sheppard^s correction is 
for the purpose of removing a bias; it does not correct for inaccuracies 
brought about by using class intervals that are too large. Gottlden, C. H., 
Methods of Statistical Analysis (1939), p. 15. 

2 Cf. pp, 134-137. 
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The value of is related to the skewness of a curve, i.e., 
whether or not the mode of the distribution is greater or less than 
the mean. If the mode is less than the mean, p ,3 is a plus quantity 
and the distribution is positively skewed; if the mode is greater 
than the mean, p 3 is a minus quantity and the distribution is 
negatively skewed. The square root of /3i, with the sign of the 
third moment, is often taken as a measure of skewness . 1 

The value of /3 2 is related to whether the frequency curve is flat 
topped or peaked. When it is flat topped, the shoulders of the 
curve are filled out and the tails depleted. When peaked, the 
frequency curve is higher at the center and the tails are also 
higher. The broad-shouldered frequency curve is said to 
be “platykurtic,” and the narrow, or peaked, one is called 
“leptokurtic.” The statistic 0 2 is said, therefore, “to measure 
kurtosis.” If / 3 2 is more than 3, the frequency curve is a peaked 
one; if / 3 2 is less than 3 , the frequency curve is a broad-shouldered 
one. For the normal curve, /3 2 = 3. 

Because of their theoretical advantages It. A. Fisher 2 has 
suggested the use of certain k and g statistics instead . of the 
moment and 0 statistics. These are defined as follows : 3 


7 2z 2 

A ‘ 2 " n~ 

7, _ __ Sr 3 

- (JV~— T) (N - 2y x 



and 




\ 


(9) 


/ 


1 For more complete discussion of skewness and of the normal frequency 
curve, see Smith and Duncan, op. cit., pp. 226—230, 263—267, 285—306. 

2 Statistical Methods for Research Workers , Appendix B to Chap. III. 

3 Cf. Fisher, It. A. Statistical Methods for Research Workers, Chap. Ill, 
Appendix on Technical Notation and Formulae (1932), p. 74; and Goulden, 
C. H., op. cit., p. 29. 
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For large values of N, k h k 2} and k s are practically the same as 
Mi, M 2 , and ju 3 ; and k 4 equals approximately /z 4 — 3 m|. 

It is necessary to apply Sheppard’s correction to the second and 
fourth k statistics, as follows: 

k 2 = k 2 (uncorrected) — ^ 

&4 = (uncorrected) + 

Also, for large values of N, gi is practically equal to y/Wi, and 
g 2 equals approximately /3 2 — 3. Thus, it is readily seen that 
g i relates to the measurement of skewness and g 2 relates to the 
measurement of kurtosis. 

Standard Deviation and the Variance . From the second 
moment about the arithmetic mean a measure of the dispersion 
of the frequency distribution is obtained. The second moment 
itself is called the “variance.” The square root of the second 
moment is called the “standard deviation.” 


O' = Vm 2 (10) 

The standard deviation gives a measure of the dispersion of 
the frequency distribution that for most purposes is preferable 
to the average deviation described below. The standard devia¬ 
tion of a normal frequency distribution, measured above and 
below the arithmetic mean, includes about 68 per cent of the 
cases. Twice the standard deviation measured above and below 
the arithmetic mean of a normal frequency distribution includes 
all but 5 per cent of the cases. When a normal distribution of 
proportional frequencies is regarded as a distribution of proba¬ 
bilities it follows that the probability of a case falling beyond the 
limits of X ± 2cr of a normal frequency distribution is approxi¬ 
mately .05, the probability of a case falling beyond X — 2<r is 
.025, and the probability of a case falling beyond X + 2a is 
.025. "These facts are of basic importance in sampling theory 
when the normal curve is used. 

Average Deviation . The average deviation is a measure 
of dispersion that has its minimum value when deviations are 
measured from the median. To compute the average deviation 
from the median, subtract each of the N values of X from the 
median, add the absolute values of the deviations, and divide 
the sum by N. Thus, 


A.D. 


2| F(X - Mi)| 
N 


( 11 ) 
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The average deviation is less affected by extreme deviations 
than the more popular standard deviation, and for this reason 
it probably has greater sampling reliability from extremely 
leptokurtic (peaked) populations. 

Range . The absolute range of a frequency distribution is the 
difference between the highest and lowest values of the distribu¬ 
tion. The relative range is this difference divided by the 
standard deviation. 

Bivariate and Multivariate Distributions. In the univariate 
frequency distributions discussed in the preceding sections, the 
data were classified according to a single characteristic. In 

Table 3.—A Bivariate Frequency Distribution of 81 Mount Holyoke 
Freshmen According to Their Grades in First- (X 2 ) and 
Second- (Xi) Semester English 
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bivariate or multivariate distributions, data are classified 
according to two or more characteristics. Table 3 is an illus¬ 
tration of a bivariate frequency distribution. The frequency 
distribution showing grades of the 81 Mount Holyoke freshmen 
in second-semester English, shown in the total column at the 



right, under the caption F, is cross classified to show how each 
group of freshmen did in its first-semester English. Thus of the 
8 students having second-semester grades between 180 and 200, 
row 7 of Table 3 shows that 2 had first-semester grades between 
140 and 160, 4 had first-semester grades between 160 and 180, 
and 2 had first-semester grades between 180 and 200. This is a 
small univariate frequency distribution of the group of students 
who had grades between 180 and 200 in their second-semester 
course. In Table 3 there are 13 rows and 12 columns, of which 
11 (in both instances) contain univariate frequency distributions. 
Since there are 11 subgroups of 11 groups, there are altogether 
121 classes, represented by 121 squares or cells in the table, of 
which 28 contain frequencies. 



15 


DEFINITIONS AND BASIC CONCEPTS 

The characteristics of a bivariate frequency distribution can 
be described by various statistics. Many of these are the same 
as the statistics employed in the description of a univariate fre¬ 
quency distribution, and some are new. Thus, the central 
tendency of one of the two variables may be measured by its 



Fig. 6.—Progression of the means of X 2 with changes in Xi. ^ 


mean, or its mode, or its median. Similarly, the dispersion of 
this variable may be measured by its range, its standard devia¬ 
tion, its average deviation, or its quartile deviation; and its 
skewness and kurtosis may be measured by /3i and 02 , respec¬ 
tively. The same is true of the other variable and of the numer¬ 
ous univariate frequency distributions that make up the details 
of a single bivariate distribution, i.e ., each frequency distribution 
of the rows and columns. 

Progressions of Means. If the data are grouped in the form 
of a bivariate scatter diagram such as Table 3, one way to meas¬ 
ure the association between the two variables is to compute the 
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mean values of one variable for various values of the other, i.e., 
the means of the rows and the means of the columns of Table 3 ! 
The means of the columns show how the X x variable tends to 
change, on the average, with changes in X 2 ; and the means of the 
rows show how the X 2 variable tends to change, on the average, 



with changes in Xi. These means have been computed and 
shown, respectively, in Figs. 5 and 6, in which the means are con¬ 
nected by a series of straight lines; these are called “polygons of 
regression.” 

Lines of Regression. The tendency of the progressions of the 
means to follow straight lines suggests the following hypothesis. 
Suppose that Xj is so related to X 2 that an increase in X 2 of one 
unit always produces an increase in X t of, say, b units, b being a 
constant. If X 2 were the only factor affecting X h all the 
values of X h when plotted, would fall exactly on a straight 
line and the progression of all means would be perfectly linear; 
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all of them would be on the line representing the equation 
X[ = 47.58 + .8322X 2 shown in Fig. 7. 

If, however, other forces also affect X h causing it to be higher 
or lower than the value expected from its association with X 2 , 
the actual values of the means would not fall on the straight line 



Fig. 8.—The fitting of the line of regression of Is on Ii by the method of least 
squares (horizontal deviations minimized). 

but would be scattered about the line in the manner shown in 
Fig. 7. According to such a hypothesis, a straight line fitted 
to the data should give the law of relationship between X Y and X 2 , 
and the scatter about the line should give the deviation from this 
line caused by the other factors affecting X lm 

A similar view could be taken of the variation in the mean 
value of X 2 with changes in X\ and would justify drawing a 
straight line to show the law of relationship between A 2 and X\. 
This is illustrated in Fig. 8. 
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The lines that are derived to show the relationship between 
the mean value of one variable and the value of another are 
called ^lines of regression,” following Francis Galton’s termi¬ 
nology, who used this term in his original study of the relationship 
between the heights of children and the heights of their parents. 1 
A line of regression of one variable on another is to be interpreted 
as indicating the values of the first, the dependent variable, that 
would be obtained for various values of the second, the inde¬ 
pendent variable, if no other forces were affecting the dependent 
variable. 

First-order Standard Deviations . In dealing with bivariate 
and multivariate frequency distributions, it becomes necessary 
to differentiate among zero-order standard deviations, first-order 
standard deviations, second-order standard deviations, etc. 
The zero-order standard deviations are standard deviations of 
the original variables. The first-order, second-order, and higher- 
order standard deviations are standard deviations of variation 
from lines and planes of regression. 

In the case of the monovariate distribution, the representa¬ 
tiveness of the mean depended upon how closely the cases were 
scattered around this mean value. This scatter was measured 
by the zero-order standard deviation. Similarly, in the case 
of a line of regression in a bivariate distribution, the representa¬ 
tiveness of the line as a measure of the law of relationship between 
the two variables depends on the scatter of cases above and 
below the line or to the right and to the left of it. The repre¬ 
sentativeness of the line of regression depicting the equation 
X[ = 47.58 + .8322X2, used in Fig. 7, may be indicated by the 
scatter of cases above and below it. A measure of this scatter 
would be the standard deviation of the vertical deviations (of 
individual cases, not of means) from the line. The standard 
deviation about the line X' 2 = — 5.548 -j- .9642Xi, shown in 
Fig. 8, would be the standard deviation of the horizontal devia¬ 
tions from that line, made by the scatter of cases (not by the 
scatter of means) about the line. These two standard deviations 
are called “first-order standard deviations.” 

The first-order standard deviations will always be less than 
the zero-order standard deviations, for the part of the variation 

1 Cf. p. 19 for method used to derive the equations for these lines. 
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represented by the line of regression has been eliminated by 
taking the deviations from the line. 

If the equation for the line of regression is X[ = a 1.2 + E. 2 X 2 , 
the first-order standard deviations are found as follows: 

Ncr \'2 = 2Xf — ai.2SXi — 6122X1X2 ( 12 ) 

and 

Na \' 1 = -- <12.12X2 — 6212X1X2 (12 7 ) 


Sometimes these first-order standard deviations are called 
“standard errors of estimate” since they indicate the error 
involved in using the line of regression as an estimate of the 
dependent variable. 

The Pearsonian Coefficient of Correlation. The progression 
of the means and the lines of regression described above are con¬ 
cerned with depicting the “law of relationship” between the two 
variables. They give the average value of one variable asso¬ 
ciated with given values of the other variable and show how 
these average values tend to change in unison with the other 
variable. Another statistic, called the “Pearsonian coefficient 
of correlation,” aims to measure the degree of association between 
the two variables. This coefficient of correlation is found by the 
equation 


__ XXiXz 

AW2 


(13) 


in which xi and x 2 refer to deviations from the means and N to the 
number of pairs of cases. 

Relationship between r and the First-order Standard Deviation. 
If the variables are measured from their mean values, Eq. (12) 
becomes 

N<t \'2 ” 2 xf —"612 2 a; 1^2 since 2 a; 1 = 0 


If the value of 612 is determined by the method of least squares, 
as Eq. (12) and (12') presuppose, 1 


, 2X1X2 <71 

b 12 = 2 ~ — 7*12 — 

2 X5 o' 2 


since 2 x 1 X 2 = Ncr 10 - 2 X 12 


and the value of Ncr\ t2 may be written 

Na\o = Na\ - Na\r\2 

1 Cf. Smith and Duncan, op. cit., pp. 331 - 333 . 
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so that 

0 - 1.2 = 0 - 2 i(l - r\f) 


and, finally, 

In the same manner, 

II 

< 

1 

t — 1 to 
to 

(14) 


cr 2 .1 = <r 2 ' s/l — r\ 2 

(14') 

These equations, it is to be noted, may be put in 
form: 

the following 


r 2 _ 1 _ 

12 ~~ a 

(15) 


w*to 

to 

II 

1 — 1 

opt 

H;- 

(15') 


From this form [Eq. (15)] it is readily seen that r is closely 
related to the first-order variance, i.e., to the scatter about the 
line of regression. If the scatter about the line of regression is a 
small percentage of the total scatter, this signifies that the line 
of regression itself accounts for a large part of the variation in 
Xi, and ri 2 is high. If the scatter (the first-order standard 
deviation) about the line of regression is a large percentage of the 
total variation in the dependent variation, this signifies that only 
a small part of the variation is shown in the line of regression, and 
r i2 is low. That is, the better the line of regression fits the data, 
the higher the value of r, and vice versa. The Pearsonian coeffi¬ 
cient of correlation is thus a measure of the goodness of fit of the 
lines of regression. 

The Pearsonian Coefficient of Correlation and the Breakup of 
Variance . For every point on a bivariate scatter diagram such 
as Fig. 7, there is a corresponding point on the line of regression 
of Xi and X 2 . Geometrically, the former is obtained by pro¬ 
jecting the point vertically onto the line of regression. This is 
illustrated by points P and P' in Fig. 7. Algebraically, the 
Xi coordinate of a point on the line of regression is found by 
substituting the given values of X 2 in the regression equation 
Xi = ai .2 + b 12 X 2 or, if the X’s are both in terms of deviations 

from their respective means, x[ = r 12 — a? 2 . 

0-2 

When the variables are measured from their respective mean 
values, the mean of the various values of x 2 is zero. Hence the 
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mean of the corresponding values of x[ is zero also; and the 
standard deviation of these x[ values is accordingly as follows: 


-2 ,= 


2 (*D 2 _ , 

N ' “ ’“71 


a\ S: 


N 


— r 2 /r 2 

— ? 12^1 


This describes the part of the variance in. Xi that is depicted by 
the line of regression; and it is to be noted that Eq. (15) may 
consequently be written 

<r 2 12 = cr\ ~~ o' 2 */ 

or, in other words, 

< t \ = (T 2 ^, + Wl.2 

a\ = o’ 2 */ + °iLi ) 

Equation (16) says that the total variance in Xi values is equal 
to the variance of the corresponding points on the line of regres¬ 
sion plus the variance of the deviations from these points. 1 
Another way of looking at this is that the total variance in Xi 
is made up of two parts, one consisting of the variance due to its 
association with X 2 as represented by the line of regression, 

the other representing the variance of Xi due to its association 
with factors independent of A^ 2 (^i. 2 )* 

It was also found just above that ~ ^i 2 °’i? hence, 




(17) 

(170 


These equations shed further light on the meaning of r, which is 
consistent with the explanation of the significance of Eq. (16). 
Equation (17) shows that r? 2 measures, the proportion of the total 
variance in Xi that is due to its association with X 2 . It also 
measures the proportion of the total variance in X 2 that is due to 
its association with Xi, as indicated in Eq. (170 • 

The Correlation Ratio. The correlation ratio is a measure of 
correlation designed to be used in cases where correlation between 
the two variables is nonlinear. In such instances it is not 
appropriate to compute a straight line of regression, but^an 
analysis similar to that outlined above may be applied. This 
latter analysis makes use of the polygon of regression rather than 
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the line of regression; i.e ., it makes use of the means of the 
columns and the means of the rows. The first-order variance 
is calculated by measuring the deviations in each column from 
its column mean, summing them, and dividing by N; this is 
the variance in X x due to association with factors independent 
of X 2 . The amount of variance in Xi explained by association 
or correlation with X 2 is then the variance in the column means 
from their mean, which is the mean of the whole distribution, 
namely, X x . The ratio of either of these variances to the total 
variance can be used to measure nonlinear correlation, and the 
ratio is called the “correlation ratio.” The symbol used is the 
Greek lower-case letter eta ( 17 ). While it has been seen that 



PART I 

General Theory of Frequency Curves 

CHAPTER II 

PROBABILITY AND THE PROBABILITY CALCULUS 

The theory of frequency curves is in large measure a special 
application of the theory of probability. An understanding of 
probability and the probability calculus is therefore essential 
for a study of the theory of frequency curves. Probability theory 
in turn is principally based on combinatorial analysis. This 
chapter on probability and the probability calculus will accord¬ 
ingly begin with a review of permutations and combinations. 

Permutations and Combinations. Permutations. A permuta¬ 
tion is an arrangement. If there are N things, they may be 
arranged in N\ different ways; for the first selection may be 
made in N ways, the second selection in N — 1 ways, the third 
in N — 2 ways, etc. Hence the total number of arrangements, 
or permutations, that may be made of N things is 

N(N - 1 )(N - 2) ■ • • 1 = N\ 

For example, the number of different permutations of five things 
is 5! = 5 X 4 X 3 X 2 X 1 = 120. 

Sometimes in forming permutations only a fraction of the total 
number of objects can be selected at one time. Thus an arrange¬ 
ment is to consist of three objects, but there are five objects from 
which to choose. In this instance the total number of different 
arrangements that can be made by the selection of three objects 
from the group of five will equal 5X4X3 = 60, for there are 
five ways of selecting the first object, four ways of selecting the 
second, and three ways of selecting the third. 

In general, the number of permutations of N things taken r 
at a time is 

P-J' = N(N - 1 ){N - 2) • ■ • (N - r + 1 ) = f) , ( 1 ) 
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Combinations. Some of the permutations of Eq. (1) will be 
constituted alike; they will merely be different arrangements of 
the same combination of things. For example, suppose the 
five objects are the letters a, b, c, d, e. If permutations are made 
of these five letters three at a time, two of these permutations 
wpuld be abc and acb , which are the same combination of the 
letters a, b, and c arranged in different order. To find the num¬ 
ber of different combinations of three letters each that may be 
made from the group of five letters, it is necessary to allow for the 
number of permutations that can be made from any one com¬ 
bination. In the first section it was found that a given set of N 
objects can be arranged in Nl ways. Hence every combination 
of three letters can be made to yield 3! = 3X2Xl=6 per¬ 
mutations without changing the combination. It follows that 
the total number of combinations of three letters each that may 
be made from a group of five letters is equal to the total 
number of permutations of five things taken three at a time 
[iV!/(iV — r )! = 5 !/2!] divided by the number of permutations of 
three things taken all at a time (r! = 3!); that is, C\ = 5!/2!3!. 

In general, the number of combinations of N things taken r 
at a time equals 


C» = 


r\(N - r)\ 


The last equation may also be looked upon as the number of 
different combinations that may be made of N things by put¬ 
ting r of them in one category and N — r in another. For 
obviously the number of different combinations that may be 
made of 10 men by putting 3 of them on a committee and leaving 
7 of them off is the same as the number of different committees 
of 3 that may be picked from 10 men. 

This new’ w’ay of looking at the problem is especially helpful 
when more than two categories are involved.. Suppose, for 
example, that three committees are to be chosen from 10 men, 
say a finance committee consisting of 3 men, a production com¬ 
mittee consisting of 5 men, and a personnel committee consisting 
of 2 men. If each man is to serve on a committee, but no man 
is to serve on more than one committee, how many different 
committees of this kind could be formed from the 10 men? Here 
it is a question of how many different combinations can be made 
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of 10 things, 3 of them to be placed in one category, 5 in another, 

and 2 in another. * 

The answer to this broader question is obtained in the same 
way as the solution of the two-category case. The total number 
of different ways of picking 10 men from 10 men is 10!. But not 
all these different ways of picking 10 men will lead to differently 
constituted committees. For any one set of committees could 
be picked in (3!) (5!) (2!) different ways without changing the 
make-up of the committees. Thus, if the finance committee 
consisted of Mr. A, Mr. G, and Mr. J, the committee could be, 
picked in that order, or in the order A, J, G, or the order G, A, J, 
or G, J, A, or J, A, G, or finally J, G, A. But each of these 3! 
ways of picking the finance committee could be combined with 
the 5! ways of picking the production committee, which would 
make (3!) (5!) different ways of picking these two committees. 
Finally, each of these (3!) (5!) ways of picking the finance and 
production committees could be combined with the 2! ways of 
picking the personnel committee, making a total of (3!) (5!) (2!) 
different ways of selecting these three committees without chang¬ 
ing their ultimate constituency. It follows, therefore, that the 
total number of different committees that can be picked is equal 
to 10!/3!5!2!. In general, the number of different combinations 
that may be made of N things by putting Ni of them in one 
category, N 2 of them in another category, JV 3 of them in a third 
category, and N k of them in a kth category, where 

Ni+ N 2 + Nn+ • • • + AT = N 
is 

V”*!> “ tfxty.W,! - (3) 

The Binomial Expansion. An important use of the com¬ 
binatorial analysis is in the derivation of a formula for the 
expansion of the binomial (a + b) N . The expression (a + b) N 
means the multiplication of (a + b) by itself N times. Terms 
of the product are thus formed by multiplying the a ’s of a various 
number of factors by the b ’s of the remaining number of factors. 
The product a r b N ~ r will therefore appear C? times, since this is 
the number of different ways in which r of the a ’s can be selected 
from N factors, no consideration being given to the order of 
selection. 


m 
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The equation for the expansion of the binomial (a -f- b) N is 
accordingly 

(a -f- b) N = C%a h + C'jy_ 1 a jV ~ 1 6 -fi- • • • -j- C%_ r a N ~ T b r 

+ • * • + C%b" (4) 

This is known as the “binomial expansion.” As will be seen in 
the next chapter, there is a frequency distribution whose rela¬ 
tive frequencies are computed in the same way as the terms 
of the binomial expansion. It is consequently known as the 
‘ ‘ binomial distribution. ” 

The Multinomial Expansion. An argument similar to that 
just outlined shows that the general term of the expansion of the 
multinomial + X 2 + ^3 + . . . + X k )» is given by Eq. 
(3). There is also a frequency distribution whose relative fre¬ 
quencies are computed in the same way as the terms of this 
multinomial expansion, and it is hence called a u multinomial 
distribution .” 1 

Mathematical Probability. Definition. If, in a given set of 
t objects, m possess a given property and n do not possess this 
property, the probability of an object of this set having the given 
property is m/t, or the relative frequency of these objects in the 
set. The word “object” may include events that have the 
property of occurring or even propositions that have the property 
of being true, as well as material or immaterial things. To 
illustrate probability by a simple case, consider an ordinary deck 
of playing cards containing 13 cards in each suit so that the 
probability of a heart is M = i. This is also the probability 
of a diamond, spade, or club in an ordinary deck of cards. In 
this illustration the probability set is finite. The definition of 
probability as a relative frequency is equally valid, however, for 
infinite probability sets. 

In defining a probability, care must always be taken to see that 
the set of objects is precisely designated and that the property 
of the object to which the probability refers is carefully dis¬ 
tinguished. It is obvious that the probability of an ace in an 
ordinary deck of cards is not the same as the probability of an 
ace in a pinochle deck, since the latter contains 48 cards of which 
8 are aces, while an ordinary deck contains 52 cards of which 4 
are aces. If a probability of an ace is defined with reference to an 
1 See Chap. XIII. 
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ordinary deck, it must not in subsequent stages of the analysis 
be taken as referring to a pinochle deck. This should be clear 
to anyone; yet the pitfalls of such a shift lie athwart the path 
to more intangible analysis, a path that is strewn with the 
intellectual bones of the unwary. 

In this connection it is to be noted that the probability of a 
heart in an ordinary deck, say, is not necessarily the same as the 
probability of drawing a heart from an ordinary deck. The first 
probability refers to a set of 52 cards, 13 of which are hearts, so 
that the probability of a heart in an ordinary deck is clearly 
M = j. The second probability refers to a set of drawings from 
an ordinary deck. The probability of a heart in this set of 
drawings is flap relative frequency of the number of hearts in the 
total set of cards drawn. The precise value of this second 
probability cannot be given until the number of cards drawn and 
the number of hearts among them have been counted. It may 
be i, or it may be some other value. 

Law of Large Numbers . That the probability of drawing a 
heart from an ordinary deck is commonly said to be i is the out- 
come of experience with certain kinds of mass phenomena. It 
has been found that if cards are drawn at random from an ordi¬ 
nary deck, the card being replaced and the deck reshuffled after 
each draw, the ratio of the number of hearts to the total number 
of cards drawn tends to approximate i whenever the number of 
drawings is large. This experience with mass random 1 phenom¬ 
ena is generalized in the law of large numbers. 

The law of large numbers says that, when a large number of 
random events is involved, it is possible to predict with reasonable 
accuracy the relative frequency of recurrence of a particular 
event by calculating a certain mathematical probability ascer¬ 
tainable from a carefully defined probability set. This law is 
the link between abstract calculations, of probability and the 
prediction of relative frequencies of events in real life. With the 
assumption of its validity, the principal problem in most cases 
becomes that of finding the mathematical model that is the 
correct one for the particular phenomenon in question. 

The probabilities in some sets are unknown but have been 
empirically approximated. Thus, if the empirically determined 

1 For a discussion of “randomness ” see pp. 154-162. 
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probability of a man dying at age fifty is used in practice by life- 
insurance companies as a method of predicting, experience shows 
that these empirically determined probabilities yield good pre¬ 
dictions if p> large enough number of men is involved. With 
large masses of data, therefore, empirically determined probabili¬ 
ties may be used in the same way as known probabilities of a given 
set, as above illustrated, are used to make predictions. 

Probability Distributions. Definitions . Any ordinary fre¬ 
quency distribution may be expressed in the form of a probability 
distribution by describing the frequencies as percentages of the 
total number of cases. A probability distribution can therefore 
be discrete or continuous. A continuous frequency curve that 
represents the distribution of relative frequency of an infinite 
population of cases is also a probability curve. Accordingly, all 
the measures of the various characteristics of frequency distribu¬ 
tions, which have been summarized in the preceding chapter, 
apply to probability distributions; thus a probability distribution 
has a mean, a standard deviation, a coefficient of skewness, and a 
coefficient of kurtosis, like any frequency distribution. There 
are also bivariate and multivariate distributions of probability, 
corresponding to bivariate and multivariate distributions. 

Probability Equations . Probability equations may be written 
in two ways. If the distribution is discrete, the probability of 
the attribute X may be written simply Y = <p(X). The prob¬ 
ability in this case is represented by the height of the ordinate Y 
at the abscissa point X. If the distribution is continuous, the 
equation Y — <p(X) serves as an algebraic description of the curve 
but it is not a true measure of the probability. It merely repre¬ 
sents the height of the curve at an abscissa point X. 

For continuous distributions the proper form for representing 
probability is d(F/N) = <p(X) dX, or dP = <p(X) dX. In this 
form the probability, or relative frequency, of a case lying between 
X and X + dX is expressed as a function of the attribute X. A 
probability, or frequency, curve is the limit approached by an 
area histogram as the class interval is made infinitesimally small. 
Thus the expression d(F/N) = <p(X) dX or dP = <p(X) dX 
merely says that, when the class interval (of size dX) is made 
infinitesimally small, the area under the curve for any class 
interval [that is, d(F/N ) or dP] is approximately equal, to the 
area of a small rectangle whose base is dX and whose height 
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is the ordinate of the curve [<^(A r )] at some arbitrary value of X 
within the interval. 

As an illustration, consider the normal probability curve. The 
algebraic equation for the normal curve is 

i -rx-.V)2 

r = (5) 

This expresses the ordinate Y simply as a function of the abscissa 
X. The probability, however, of a normal variate lying between 
X and X + dX is given by the equation 

1 -(X-X)2 

dP = , - 27*- dX (50 

<7 V 2tt 

In both cases e( = 2.7183+) is the base of the Napierian system 
of logarithms, X is the mean of the distribution, and cr its standard 
deviation. 

The Probability Calculus. The Addition Theorem. If the 
attributes of a given probability set are X :l , X 2 , . . . X s (repre¬ 
senting either qualitative or quantitative characteristics) and 
their probabilities are p\. p$, . . • , p B , then the probability of 
X ; i, X 2 , or X 3 , say, i.e., the attribute of being any one of these 
X\s, is pi + p^ + £> 3 . If the variation in attributes within the 
probability set is continuous and if the distribution of probability 
is described by an equation such as dP — <p{X) dX, the prob¬ 
ability of an attribute within any one of a number of small 
ranges dX whose sum constitutes the range Xi to X 2 is given by 
x s x 2 

%dP = dX, or, in the symbolism of the integral calculus, 

The addition theorem is stated briefly as follows: The prob¬ 
ability of either one of two mutually exclusive events (attributes) 
is the sum of their individual probabilities. 

In using this theorem care must be taken to interpret correctly 
the term “mutually exclusive.” What the term signifies is that 
the addition theorem applies only to probabilities of one and the 
same probability set. For the attributes of a probability set 
are by definition mutually exclusive. Hence the effect of this 
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term is to warn the student to define carefully his probability 
set at the start. It is to be noted that, while the attributes of a 
given probability set are mutually exclusive, not all mutually 
exclusive attributes are members of the same probability set, 1 

The Multiplication Theorem . The multiplication theorem 
pertains to the calculation of a probability of a derived, or second- 
order, probability set from the probabilities of two or more first- 
order sets. In deriving the multiplication theorem two cases 
are distinguished, one pertaining to independent probabilities, 
the other to dependent probabilities. Consider first the case 
of independent probabilities. 

The number of dots on the faces of a die form a set of mutually 
exclusive attributes the probability of each of which is Two 
dice form two such probability sets. Each of these may be 
viewed as a first-order probability set. Consider now the attri¬ 
butes given by the sum of two faces of a pair of dice. If there 
is no restriction on the way in which the face of one die can be 
paired with the face of the other die (the condition of independ¬ 
ence), then each face of one die can be paired with every face 
of the other die and the sum of these faces may take on the 
values 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12. If all possible pairs are 
formed, there will be 6 X 6 = 36 pairs. Only one of these will 
yield the sum 2, viz., the pair 1 and 1. Hence, in this derived, 
or second-order, probability set of 36 combinations, the prob¬ 
ability of a sum having the value 2 is This, however, is equal 
to the product of the probability of a 1 on one die times the 
probability of a 1 on the other die, that is, i X i = A. This 
illustrates the multiplication theorem for independent probabili¬ 
ties. It says that, if the probability of attribute B of set II is 
independent of the probability of attribute A of set I, then the 
joint probability of A and B in the probability set formed by 
combining each attribute of set I with every attribute of set II 
is equal to the product of the probability of A in set I times the 
probability of B in set II. Succinctly, if P(B) is independent of 
A, P(A,B) = P(A) X P(B). 

If the probability of B is dependent in some way on the occur¬ 
rence of the attribute A, then the multiplication theorem for 


1 ^ or further discussion, see Smith and Duncan, Elementary Statistics 
and Applications , p. 269. 
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independent probabilities cannot be applied. In its place must 
e put the multiplication theorem for dependent probabilities. 
1 his says that the joint probabilitity of A and B is equal to the 
probabiKty of A times the probability of B given A. Succinctly' 
P(A,B) = P(A) X P(A/B)A 

1 For further discussion, see ibid., pp. 269-274. 



CHAPTER III 


THE SYMMETRICAL BINOMIAL DISTRIBUTION 
AND THE NORMAL CURVE 


The preceding chapter has provided the tools that are now to 
be employed in shaping a general theory of frequency curves. 
This chapter is the first step in the development of that theory. 

The argument will begin with a study of a simple problem 
in combinatorial analysis. The basic data will be 10 coins. 
These will be marked with a head on one side and a tail on the 
other. The problem will be to determine the relative frequencies 
or probabilities of various types of combinations in the whole 
set of combinations that might be made from various arrange¬ 
ments of the 10 coins. Immediate attention will center on the 
form of this derived distribution of probability, and exact and 
approximate equations will be determined. Ultimately, con¬ 
sideration will be given to how this combinatorial analysis may 
explain some of the frequency distributions that appear in real 

life. ■ t 

The Symmetrical Binomial Distribution. Derivation, bup- 

pose there are 10 coins all exactly alike and each having a head 
and a tail. Each of these coins represents an elementary prob¬ 
ability set having two attributes, a head and a tail. . Since there 
is only one head and one tail on a coin, the probability of each 


!S 2- . ( 

The 10 coins may be arranged in various ways so as to torm 

different combinations of heads and tails. The various possible 
combinations are as follows: no heads, 10 tails; 1 head, 9 tails, 2 
heads, 8 tails; 3 heads, 7 tails; 4 heads, 6 tails; 5 heads, 5 tails; 6 
heads, 4 tails; 7 heads, 3 tails; 8 heads, 2 tails; 9 heads, 1 tail, and 
10 heads, no tails. The probability of the combination no heads, 

10 tails, is the product = (i) 1 "- 

The probability of the particular combination HTTTTTTTTT 
is also (i) 10 ; but since there are nine other ways in which combina¬ 
tions containing 1 head and 9 tails can be formed, the total 
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probability of a combination containing 1 head and 9 tails is 
10(i) 10 . In general, the total probability of a combination con¬ 
taining r heads and 10 — r tails is CJ 0 (i) 10 , which is. the^formula 
for the (r + l)th term in the expansion of the binomial (i + iV°- 
The probabilities of the various combinations of heads and tails 
are shown in Table 4. 

Table 4.—Probabilities of Various Combinations of Heads and Tails 

among 10 Coins . 


Combinations of 
heads and tails 

Probability 

■H 

T 


0 

10 

1 

1,024 

1 

9 

10 

1,024 


8 

45 

2 

1,024 



120 

3 

7 

1,024 



210 

4 

6 

1,024 

5 

5 

252 

1,024 

6 

4 

210 

1,024 

7 

3 

120 

1,024 

8 


45 

2 

1,024 

9 

1 

10 

1,024 

10 

0 

1 

^ 1,024 


For N coins, the probability of a combination having N i heads 
and N - Ab. tails is as follows: 1 

(1) 

or, if N 2 = N - Ni, 

w-jwb© (2> 


1 See p. 24. 
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Equation ( 2 ) is thus the general equation for what is called the 
“symmetrical binomial distribution .” 1 

Characteristics . Mathematical analysis shows that in gen¬ 
eral the symmetrical binomial distribution has the following 
characteristics : 2 



The variable, it will be noted, is N 1 , the number of heads. 

The Normal Curve. Derivation from Binomial Distribution. 
If 40 coins were used, the distribution of probability of N i heads, 
N ~ N i tails would be considerably more spread out than when 
10 coins were tossed. In general, the equation, d = \/W/i 
indicates that the dispersion of the distribution increases in 
proportion to the \/N . If the horizontal scale is reduced, how¬ 
ever, and the vertical scale enlarged, in the same proportion in 
v which the dispersion of the distribution is increased, then the 
effect of increasing N is to bring the ordinates of the distribution 
closer together and to raise them to the height of the original 
distribution. Under these conditions the tops of the ordinates 
tend to sketch out a smooth curve as N is increased. 

Equations (3) suggest that the curve approached as a limit by 
the symmetrical binomial as N increases is the normal prob- 

ability curve. For fc = 0, and = 3 - | approaches 3 as N 

approaches infinity. It can also be shown that if a line is drawn 
between two ordinates of the symmetrical binomial the ratio of 
the slope of this line to the average of the two ordinates, i.e., the 
relative slope of the binomial distribution at any mid-point, is 

1 The name follows from the fact that it is the equation for the general 
term of the expansion of (§ + (see pp. 25 - 26 ). 

2 These equations are derived in the Appendix to Chap. IV (pp. 65 - 67 ). 
It will be noticed that the parameters X, d, etc., are in boldface type since 
they are population values and not sample statistics. This differentiation 
must be carefully watched in this volume. Cf. Smith and Duncan, Elemen¬ 
tary Statistics and Applications, pp. 123 - 124 , 316 . 
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the same as the relative slope of the normal curve that is traced 
out by the binomial ordinates. 1 Both these suggestions that the 
limit of the symmetrical binomial is a normal curve are borne out 
by strict mathematical analysis. 2 Thus the limit of the sym¬ 
metrical binomial is 3 


d \/2 7T 


exp 


x 

2d 


; 2 " 

(T 2 


(4) 


Here x represents a deviation from the mean value and equals 


If z is set equal to x/6 and if probabilities are measured by 
areas instead of ordinates, the equation becomes 


dP = 


\/2h 


exp 


dz 


(5) 


which is the equation for a normal curve whose standard devia¬ 
tion is 1. It is subsequently referred to as the standard normal 
curve. If N is large, therefore, the probability that N i lies 
between Ni and N[' can be found approximately by computing 
N N 


m ; - 


n 'i 

and and findin £ the area under the standard 


normal curve between these limits. The latter is easily accom¬ 
plished by reference to the normal probability table given in 
Table VI of the Appendix. 4 

Significance of the Symmetrical Binomial Distribution and 
the Normal Curve. The practical significance of the symmetrical 
binomial distribution and the normal curve is that certain con¬ 
ditions in real life appear to produce frequency distributions that 
have this form. Consider first a real situation that very closely 
parallels the combinatorial problem of the preceding sections. 
Let 10 unbiased coins (unbiased in the sense that they are 
physically uniform) be tossed at random 5 a large number of times. 

1 See Appendix to Chap. IV, pp. 74-76. 

2 Ibid., pp. 68-76. 

3 The expression exp [x] is merely another way of writing e x . It is fre¬ 
quently used in this text to facilitate the printing of complicated equations. 

4 For further discussion on how to use the normal probability table see 
pp. 120-123, 164-168. 

5 Whether a method is random or not has to be determined largely by 
intuition and experience with similar experiments. See pp. 154-162. 
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Experience with such experiments indicates that the relative 
frequencies with which 0, 1, 2, . . . , 10 heads appear will be 
approximately the same as the probabilities of the binomial 
distribution. This is merely one instance of the law of large 
numbers referred to in Chap. II, 

Again, suppose a large number of flour bags, each weighing 5 
pounds at the start, are opened in succession and a certain quan¬ 
tity of flour is added or taken away in accordance with the follow¬ 
ing rule: When a bag is opened, 10 coins are tossed and an ounce 
of flour is added for each head that appears, and subtracted for 
each tail that appears. For this experiment the law of large 
numbers suggests that the outcome will be a set of bags varying 
in weight approximately as follows: 


Table 5.—Relative Frequencies of Bags of Flour of Specified 


Weights 


Weight of 
Bag 


Relative 

Frequency 


4 lb. 6 oz. 
4 lb. 8 oz. 
4 lb. 10 oz. 
4 lb. 12 oz. 

4 lb. 14 oz. 

5 lb. 0 oz. 
5 1b. 2 oz. 
5 lb. 4 oz. 
5 lb. 6 oz. 
5 lb. 8 oz. 
5 lb. 10 oz. 


1 


1,024 

10 


1,024 

45 


1,024 

120 

1,024 

210 

1,024 

252 

1,024 

210 


1,024 

120 

1,024 

45 

1,024 

10 


1,024 


1,024 


In other words, the distribution of weights will approximately 
conform to a symmetrical binomial distribution with a mean 
weight of 5 pounds, and a standard deviation of 2 -\/2.5 = 3.162 
ounces. 

If a much larger number of coins were employed and the amount 
of flour added or subtracted per head or tail were made very 
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small, the distribution of weights would become practically con¬ 
tinuous and would form a normal curve. 

The two examples just given suggest “laboratory experi¬ 
ments by which a symmetrical binomial distribution and, in the 
second case, a normal curve might be produced by real events. 
Certain conditions in everyday life that appear to parallel these 
laboratory experiments also produce symmetrical binomial dis¬ 
tributions and normal curves. Among a number of animals, for 
example, the biological conditions appear to be such that the 
chance of male offspring is equal to the chance of female offspring. 
Distributions of the number of males per families of given size 
therefore closely approximate the symmetrical binomial distribu¬ 
tion. Again, in many cases of physical measurement, there are a 
host of forces tending to cause slight positive and negative errors, 
with the result that the net error of measurement is generally 
distributed like a normal frequency curve. 1 The heights of 
adult males of the same race, the heights of adult females of the 
same race, grades of students, the durability of electric-light 
bulbs, and many other biological, psychological, and physical 
variables are likewise normally distributed. 

Summary of Conditions Leading to Symmetrical Distribution. 
The foregoing analysis suggests that, whenever the following 
conditions exist in real life, the data generated by these condi¬ 
tions will tend to be distributed in the form of a symmetrical 
binomial distribution and, if certain other conditions are also 
present, in the form of a normal curve. 

A. The conditions giving rise to the symmetrical binomial dis¬ 
tribution may be stated as follows: 

1. In the absence of certain “causes’ 5 of variation or in the 
event of a perfect balancing of their effects, the data assume a 
fixed central value (the 5 pounds of the flour illustration). 

2. Deviations from this central value result from certain 
causes of variation, the effect of any cause being either to add a 
fixed quantity to the data or to subtract the same quantity (to 
add or subtract 1 ounce of flour). 

3. The probability of a cause of variation producing a positive 
effect equals the probability of its producing a negative effect, 

1 For a more extended, discussion see Smith and Duncan, Elementary 
Statistics and Applications, pp. 294-295. 
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that is, P(+). = P{ —) = i (the probability of a head equals 
the probability of a tail). 

4. The effects of all contributory causes of variation are of 
equal magnitude (each adds or subtracts 1 ounce of flour). 

5. The contributory causes are independent in their action. 
That is, the probability of a positive or negative contribution 
by any causal factor is independent of the previous contributions 
of other causal factors; the sets of deviations generated by the 
various causes are all independent. 

6 . The total deviation of any element from its central value is 
the algebraic sum of the positive and negative contributions of 
the individual causal factors (the total amount of flour added or 
subtracted from a bag is the sum of the ounces added for each 
head tossed minus the ounces subtracted for each tail tossed). 

B. If, in addition to the above conditions, the following also 
exist, then the resulting distribution will tend to conform to the 
normal curve: 

7. The number of contributory causes is very large (a large 
number of coins, instead of only 10 coins, are tossed). 

8 . The positive and negative contributions of each cause is 
very small (if .01 ounce is added or subtracted instead of 1 ounce). 

It is to be noted that, so far as the normal curve is concerned, 
not all these conditions are necessary for its generation. The 
foregoing conditions will produce it, but it can be shown that the 
normal curve may also occur when some of these conditions are 
absent. The normal curve will still be produced if conditions 
2 and 3 are relaxed so that a causal factor may affect the data in 
varying degree and with varying probabilities and also if condi¬ 
tion 4 is only approximately and not exactly true. 1 Under cer¬ 
tain conditions the requirement of independence (condition 5) 
may also be relaxed. 

The most important conditions for the normal curve are 6, 
7, and 8, and condition 4 in an approximate form. For example, 
in the case of the flour illustration, the resulting weights of the 
bags of flour would still tend to be normally distributed even if 
biased dice instead of unbiased coins were used and if the amount 
of flour added or subtracted varied with the result of the throw 
(say, .001 ounce for the occurrence of a one, —.002 ounce for the 
occurrence of a two, .003 ounce for the occurrence of a three, 

1 See pp. 137-142. 
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— .004 for the occurrence of a four, etc.), provided that the num¬ 
ber of dice thrown were very large and the amount added or sub¬ 
tracted per die were very small and of about the same order of 
magnitude from die to die. The normal curve is thus a more 
general phenomenon than the symmetrical binomial distribution. 
Mathematically, the normal curve can be derived from a great 
variety of different assumptions. 1 

1 Cf. Czuber, Emanuel, Theorie der Beobachtungsfehler , B. G. Teubner, 
Leipzig, 1891. 



CHAPTER IV 

THE PEARSONIAN SYSTEM OF FREQUENCY CURVES 

In the preceding chapter the conditions that would produce 
the symmetrical binomial distribution and the normal curve were 
outlined and briefly discussed. It was pointed out, however, 
that not all the conditions laid down are necessary for the produc¬ 
tion of the normal curve, and further consideration was postponed 
until this chapter. This more extensive examination will now 
be undertaken. 

ASYMMETRICAL BINOMIAL DISTRIBUTION 

Derivation . Beginning in this section some of the more 
important conditions will be examined that lead to nonnormality 
in homogeneous data. The discussion will again begin with some 
problems in the realm of combinatorial analysis. 

Suppose there are 10 prisms, each of which has three faces 
marked with an H and the fourth marked with a T. The prob¬ 
ability of an H on each prism is thus and the probability of a 
T is ¥• This differs from the previous coin problem in that the 
probabilities of the two attributes are not equal. As will be 
indicated later, it is an abandonment of condition 3 of the pre¬ 
viously listed conditions for normality. 

The particular problem that will be discussed is the determina¬ 
tion of the probabilities of the various types of combinations 
that may be made from independent selection of the faces of these 
10 prisms. The types of combinations will be the same as those 
of the coin problem (an H may be viewed as a “head” and a T 
as a tail”), but the probabilities of each will be found to be 
different. For convenience let the various prisms be designated 
by the letters A, B, C, . . . , J. 

Analysis of the problem begins, once again, with the determina¬ 
tion of the probability of a particular combination. First con¬ 
sider the combination having no H’s, viz., 

ABCDEFGHIJ 

TTTTTTTTTT 

40 
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Since the probability of a T on each prism is | and since the face 
of each prism is selected independently of the others, the prob¬ 
ability of all prisms being T’s is. by the multiplication theorem 
for independent probabilities, 

i ' i ' i ‘ t * t • t * i • j: • d • 4 = (I) 10 

Furthermore, the given combination is the only way in which no 
H\s can occur. Hence the probability of no H’s is just 

1 


1,048,576 
Consider next the combination 

ABCDEFGHIJ 
H T T T T T T T T T 

This is a combination having but one H. Since the probability 
of A being an H is I and the probability of each of the other 
prisms being T s is the probability of this particular combina¬ 
tion is 

The same would be true of the combination 

abcdefghij 

THTTTTTTTT 

or in fact of any combination having but one H. Since there are 
10 such combinations altogether, the probability of any of these 
10 combinations is, by the addition theorem, 



io, i)G 


= ( 10 )( 3 ) 

1,048,576 

This, then, is the probability of one H. 

The following is a combination having but two H’s: 

ABCDEFGHIJ 

HHTTTTTTTT 

The probability of this particular combination is 

i-i-i-i-i-i-i-i-i-i = amr 

This is also the probability of any particular combination having 
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but two ITs; and since there are 45 such combinations, the prob¬ 
ability of any one of them is 


45 


!)'(! 


(45) (9) 
1,048,576 


This is the probability of two H’s. 

In general, the probability of N 1 H’s with 10 prisms is given 
by the equation 


P(N i) = 


10 ! 

Fi!(10 -IVi)! 


mi 


10—IV i 


, For Ni = 0 to Ni = 10, this equation yields the results shown 
in Table 6. 

Table 6.— Probabilities of Various Combinations of 10 Prisms, Each 
Having Three Sides Marked with an H and One with a T 
Combinations 


Having 
0 H 

1 H 

2 H 

3 H 

4 H 

5 H 

6 H 

7 H 

8 H 

9 H 
10 H 


Probability 

1 

1,048,576 

30 

1,048,576 

405 

1,048,576 

3,240 

1,048,576 

17,010 

1,048,576 

61,236 

1,048,576 

153,090 

1,048,576 

262,440 

1,048,576 

295,245 

1,048,576 

196,830 

1,048,576 

59,049 

1,048,576 


It will be noted that the probabilities of Table 6 are the suc¬ 
cessive terms of the expansion of (| +i) 10 . The distribution 
obtained is thus a “binomial” distribution, but it is no longer 
symmetrical. If N prisms had been used, tjie probabilities 
obtained would have been the successive terms of the expansion 
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of (I + i) Ar and the equation for any one term would have been 

r,N '> - WMrhmit?'$ 


N—N i 


or, if AM = A r 


N i 
P(N 


, AM /3 V 1 /iV* 

lj iV i !iV 2 ! \4/ \4/ 


Table 7.—Probabilities of Various Combinations of 10 Prisms, Each 
Having Three Sides Marked with a T and One with an H 
Combinations 


Having 
0 II 

1 H 

2 H 

3 H 

4 H 

5 H 

6 H 

7 H 

8 H 

9 H 
10 H 


Probability 

59,049 

1,048,576 

196,830 

1,048,576 

295,245 

1,048,576 

262,440 

1,048,576 

153,090 

1,048,576 

61,236 

1,048,576 

17,010 

1,048,576 

3,240 

1,048,576 

405 

1,048,576 

30 

1,048,576 

1 

1,048,576 


If each prism is replaced by a group of objects 1 within which 
the probability of an H is pi and that of a T is p 2 , if there are N 
such groups, and if combinations are composed so as to consist 
of one object from each group, then the probabilities of the 
various types of combinations among the set of all possible 
combinations will be given by the equation 


P(N i) 


AM 

AM!AM! 


PW 


( 1 ) 


This is the most general equation for the binomial distribution. 
When pi = p 2 = i, it is the equation for the symmetrical bino- 
1 The number of objects in each group is not relevant to the argument. 


. . ■ L ■ ■ 
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mial distribution. When p x 7 * p 2 , it represents an asymmetrical 
binomial distribution. 

Character of the Asymmetrical Binomial Distribution. A graph 
of the probabilities of Table 6 is shown in Fig. 9. The asym- 



Fig. 9. —Asymmetrical binomial distribution, N = 10, pi — p 2 = J (see 

Table 6). 

metrical character of the distribution is obvious and needs no 
comment. Table 7 and Fig. 10 show what the distribution of 
the number of H’s would have been if the probabilities of H’s and 
T’s had been reversed, i.e., if the probability of an H had been 
i and the probability of a T -J. It will be noted that this altera¬ 
tion in the probabilities changes the skewness of the distribution 
from negative to positive. This is generally the case; if the 
probability of an H is greater than the probability of a T, the 
distribution of the number of H’s will be negatively skewed; 
and if the probability of an H is less than the probability of a T, 
the distribution will be positively skewed. 1 

1 If the number of T’s were taken as the attribute, the reverse would be 
true. 
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Mathematical analysis shows that in general the asymmetrical 
binomial distribution 1 has the following characteristics (the 
variable is the number of IPs, N i): 


X = Mpi 

Mo = int eger b etween JVpi — p 2 and iVpi + pi 
d = *\/ N p 1P2 


5i = 
h = 


(Pa ~ Pi ) 2 

iVplp2 

3 + --y GpiP - 2 
N pip 2 



0.288 

0.264 

0.240 

0.216 

0.192 

I 0 - 168 

So. 144 

£ 

^ 0.120 

0.096 

0.072 

0.048 

0.024 

0 


(12 3 4 5 6 7 8 

Fig. 10.—Asymmetrical binomial distribution, N - 

Table 7). 


9 10'Nb.ofHs 

10, Px - i, p 2 


(see" 


As indicated above, the sign of s/§i should be the sign of P 2 — Pi, 
so that it will be positive when the distribution is positively 
skewed and negative when it is negatively skewed. These equa- 

1 These, of course, also apply to the symmetrical binomial distribution 
in which case pi = P 2 = 
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tions are derived in the Appendix to this chapter. 1 It might be 
well for the nonmathematical student, however, to check them 
by calculating X, d, (Ji and (3 2 of the distribution of Table 7 and 
then comparing the results given by the application of Eqs. (2).* 

Asymmetrical Binomial Distribution and the Normal Curve. 
If N is increased and if the horizontal scale is reduced in propor¬ 
tion to s/N, the tops of the ordinates of the asymmetrical 
binomial distribution will tend to trace out a smooth curve, just 
as did the symmetrical binomial distribution. In general, the 
character of this curve will depend on the size of N and on the 
difference between pi and p 2 . 

Equations (2) suggest that, if N is very large, any binomial 
distribution, asymmetrical or symmetrical, will be approximated 
fairly well by a normal frequency curve. For as N is increased, 
(3i approaches 0 and (3 2 approaches 3, which are the values of these 
coefficients for a normal curve. This implication is borne out 
by rigorous mathematical analysis. 2 It is to be concluded then 
that, if N is very large, an asymmetrical binomial distribution 
is approximated by a normal frequency curve, just as was the 
symmetrical binomial distribution. 

The conclusion that an asymmetrical curve can be approxi¬ 
mated by a symmetrical curve would seem at first glance to be 
paradoxical. It is to be noted, however, that this is true only 
when N is very large, and in that case the binomial distribution 
will not be very asymmetrical. For, as just indicated, the 
skewness of the asymmetrical binomial distribution dimin¬ 
ishes as N increases. This follows directly from the fact that 


,, = (pi - Pi ) 2 
1?1 iV PlP 2 


It may also be noted from any graphic analysis 


showing how the shape of a particular binomial distribution, i.e., 
one with a given pi and a given p 2 , 'changes its shape as N is 
increased. Such a graphic comparison is presented in Fig. 11, 
for which the data are shown in Tables 8 and 9. 


The size of N that must be attained before the asymmetrical 


binomial distribution becomes practically symmetrical and can 


1 See pp. 65-68. 

* The direct calculation of these quantities is the same as that carried 
out in Smith and Duncan, Elementary Statistics and Applications, pp. 284- 
285, for the symmetrical binomial distribution. 

2 See Appendix to this chapter (pp. 68-74). 
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be approximated by the normal curve depends on the relative 
size of pi and p 2 . For as the formula for gi shows, if the difference 
between pi and p 2 is very great, then N must be so much the 

Table 8.—Probabilities of Various Combinations of Four Prisms, 
Each Having Three Sides Marked with an H and One with a T 

(N = 4) 


Combinations 


Having 

Probability 

0 H 

.003 

1 H 

.047 

2 H 

.211 

2 H 

.422 

4 H 

.316 


Table 9,—Probabilities of Various Combinations of 16 Prisms, Each 
Having Three Sides Marked with an Ii and One with a T 

(N = 16) 

Combinations 


Having 

Probability 

0 

H 

.00000000023 

1 

H 

.000000012 

2 

H 

.00000025 

3 

H 

.0000035 

4 

H 

.000034 

5 

H 

.00025 

6 

II 

.00135 

7 

H 

.00583 

8 

H 

.01966 

9 

H 

.05243 

10 

H 

.11095 

11 

H 

.18015 

12 

H 

.22519 

13 

H 

.20787 

14 

H 

.13363 

15 

H 

.05345 

16 

H 

.01002 


larger if (5i is to be close to zero. On the other hand, if pi and p 2 
are relatively close, then N need not be very large to make the 
distribution approximately symmetrical. 

Asymmetrical Binomial Distribution and Pearson’s Type III 
Curve. Although the asymmetrical binomial distribution 
approaches the normal curve when N is large relative to p 2 — pi, 
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it is nevertheless of interest to find the type of curve that approxi¬ 
mates this distribution in those cases when it is still markedly 
.skewed. One approach to this problem is suggested by the 
previous analysis of the symmetrical binomial. The normal 



Fig. 11.—Comparison of two asymmetrical binomial distributions (see Table 9). 


curve, it will be recalled, 1 was found to be the curve that had 
the same relative slope at various points as did the symmetrical 
binomial. This suggests that the curve that will give a good fit 
to the asymmetrical binomial will also be one for which the 
relative slope at various points is the same as the relative 
slope of the asymmetrical binomial distribution. Such a line 
of attack was adopted by Karl Pearson, and the curve that 
he found to have this property is the one whose logarithmic 
form is 


1 See pp. 34-35. 
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logio y — logio 2/0 + ka logio y 1 + -j ~ k:c logio 6 (3) 

where # represents a deviation from the mode of the curve, the 
height of the curve at the mode, y the height of the curve at any 
point x, logio e = .434294, and a and k are constants depending 



Fig. 12.—Comparison of asymmetrical (solid line) binomial with Pearson’s 
type III curve (broken line) (see Table 10). 

on the Pi, p 2 , and N of the binomial distribution from which the 
curve is derived. 1 This curve has come to be known as Pearson’s 

1 More specifically, x = (Ah +• ■§) — Npi + p 2 — i, 

log 10 2/0 = (ka + 1) logic ka - logio (ka)! - logio a - ka logio e, 

a = an u k = —~- See Appendix to this chapter (pp. 

P2 Pi P2 “ Pi 

76-79). 
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type III curve. 1 The way in which it fits the asymmetrical 
binomial distribution of Table 10 is shown in Fig. 12. 


Table 10.— Comparison of Asymmetrical Binomial Distribution with 
Pearson’s Type III Curve 


X 

Asymmetrical binomial 
distribution 1 

P ( X ) 

Pearson’s 
type III curve 
P '( X ) 

0 

.056,314 

.061,245 

1 

.187,712 

.181,660 

2 

.281,568 

.272,720 

3 

.250,282 

.243,300 

4 

.145,998 

.144,220 

5 

.058,399 

.061,333 

6 

.016,222 

.019,826 

7 

.003,090 

.005,098 

8 

.000,386 

.001,079 

9 

000,029 

.000,192 

10 

.000,001 

.000,030 


1 Derived from Table 7 (p. 43), by converting the fractions into relative numbers. 


HYPERGEOMETRICAL DISTRIBUTION 

In the previous section it has been seen that, when the prob¬ 
abilities of the two attributes are not equal, the result is an 
asymmetrical instead of a symmetrical binomial distribution. 
The bearing of this upon the generation of nonnormal frequency 
distributions in real life will be discussed below. 2 Before turning 
to these broader aspects of the analysis, however, it is desirable 
to consider the effects of other modifications of the conditions 
for normality. It is of interest in particular to consider what 
happens when the condition of independence (condition 5) 3 is 
removed. This will now be studied in some detail. 

1 The equation for this curve was first published by Karl Pearson in the 
Proceedings of the Royal Society of London , Yol. 54 (1893), p. 331. It was later' 
discussed at some length in his fundamental paper on frequency curves, 
“Contributions to the Mathematical Theory of Evolution. II. Skew Varia¬ 
tion in Homogeneous Material.” See the Philosophical Transactions of the 
Royal Society of London , Series A, Yol. 186 (1895), pp. 356-360. Also see W. 
P. Elderton, Frequency Curves and Correlation , Cambridge University Press, 
London (1927), pp. 45, 90-94. 

2 See pp. 60tfF. 

3 See p. 38. 
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Derivation of the Hypergeometrical Distribution. To show the 
effect of dependent probabilities on the form of a frequency dis¬ 
tribution consider the following card problem. Suppose there 
are 10 packs of 52 cards, each containing at the start 13 spades, 
13 hearts, 13 diamonds, and 13 clubs. Suppose further that all 
possible, combinations are made by selecting one card from each 
pack in order, subject to the condition that if a card has already 
been selected from one pack the same card is not eligible for 
selection from subsequent packs. For example, if the ten of 
spades is selected from the first pack, then the ten of spades is 
not eligible for selection from any other pack. Again, if the ten 
of spades is selected from the first pack, the five of hcaits fiom 
the second pack, and the ace of clubs from the third pack, then 
neither the ten of spades, nor the five of hearts, nor the ace of 
clubs is eligible for selection from packs 4 to 10. Given this 
restriction on the selection of cards, let the problem be to find 
the probabilities of combinations containing various numbers of 
spades in the whole set of combinations of 10 cards that may be 
formed in the way described. To facilitate the analysis let the 
cards that make up a combination be represented by letters 
from A to J. 

Consider first the combination 

ABCDEFGH I J 

0 0 0 0 0 0 0 0 0 0 

where 0 stands for a card other than a spade. This is a com¬ 
bination containing no spades. The probability of the first card, 
A, in the combination being other than a spade is If, for there 
are 39 nonspades among the 52 cards open for selection from the 
first pack. The probability of the second card, B , being other 
than a spade is If, for after a nonspade has been selected from 
the fust pack there are only 51 cards left for selection from the 
second pack and only 38 of these are nonspades. In like manner 
the probability of the third card, C, being other than a spade 
is U, etc. 

By the multiplication theorem for dependent probabilities 
the probability of the combination containing no spades is 
therefore 
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Consider next the combination 


abcdefghij 

s 0 0 0 0 0 0 0 0 0 

This is a combination containing only one spade. Its probability 
by the multiplication theorem for dependent probabilities is equal 
to the probability of a spade in the first pack (that is, if), multi¬ 
plied by the probability of a nonspade in the second pack after a 
spade has been selected from the first pack (that is, ff), times 
the probability of a nonspade in the third pack after both a spade 
and a nonspade have been selected from the first and second packs 
(that is, If), etc. In short, the probability of this particular 
combination ife 


1_3 . 3.9 . 3 
5 2 5T T' 


-f-X . 3 6 . 3 5 . 3 4 . 3 3. ,3 2. . 3 1 

49 48 47 46 4 5 4T If 


There are 9 other combinations, however, that contain only one 
spade, and in each case the probability can be shown to be the 
same as that just computed. Hence, by the addition theorem, 
the probability of any one of these 10 combinations, i.e., the 
probability of a combination containing one spade, is 


1 0 • . M - 2 - . 3-8 . .3 7. . £6 . 3 5 3 4 3 3 3 « 3 1 

U 52 ol 50 49 48 IT ' 16 ' 15 if • H 


Consider next the following combination: 


ABCDEFGHIJ 

ssoooooooo 

This is a combination containing two spades. The probability 
of this particular combination is equal to the probability of a 
spade in the first pack (that is, If), times the probability of a 
spade in a deck from which a spade has been drawn (that is, if), 
times the probability of a nonspade in a deck from which two 
spades have been drawn (that is, ff), times the probability of a 
nonspade in a deck from which two spades and a nonspade have 
been drawn (that is, ff), etc. In short, the probability of this 
particular combination is 


There are Cf = 45 different combinations, however, containing 
only two spades, and the probability of each can be shown to be 
the same as that just computed. Hence the probability of any 
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one of these 45 combinations, i.e., the probability of two Spades 
is ’ 


45-H-H-M-ff-H-n-e-ji -H-If 


A continuation of this same line of reasoning shows that in 
general the probability of N t spades out of 10 is 


(13)(12) • • • (13 


+ 1) (39) (38) • • • 

(39 - 10 4 - Nt + 1) 
(52) (51) (50) (49) (48) (47) (46) (45) (44)'(43) 


pm = Cl? ■ - 

This yields the distribution shown in Table 11. 


Table 11.— Probabilities of the Various Possible Numbers of Spades 
among All the Combinations of 10 Cards Each That Might 
Be Made of the Cards in an Ordinary Playing Deck 
Combinations 

Containing Probability 


0 spade 

1 spade 

2 spades 

3 spades 

4 spades 

5 spades 

6 spades 

7 spades 

8 spades 

9 spades 
10 spades 


40,186 
1 , 000,000 
174,140 
1 , 000,000 
303,340 
1,000,000 
278,070 1 
1 , 000,000 
147,460 
1,000,000 
46,840 
1 , 000,000 
8,922 1 
1,000,000 
991 

1,000,000 

60 

1,000,000 

2 

1,000,000 

0.02 

1,000,000 


c ? ea tf ll *L f N P . acks contains 8 cards > Pi'S of which are spades 
ndp 2 S other suits, p t -f- p 2 being equal to 1, and if all possible 
combinations of N cards each are formed by selecting a different 
card from each pack, the probability of JV, spades will be given 
by the general equation 
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(Pi£)(PiS-1) 


P(Ni) = Cl 


(puS - iVt + l)(p 2 S) ' ' ' 
(p 2 S -N + Nt+l) 


(S)(S - 1) • • • (S - N + 1) 


(4) 


This expression is a term in a hypergeometrical series, 1 and the 
distribution of Table 11 may be called a hypergeometrical dis¬ 
tribution. A somewhat simpler equation that is equivalent to 
Eq. (3)* is as follows, 


P{N i) 


( Pl S)!(p 2 S)!(S - N)W\ _ 

( pl $ - A r i)!(p 2 >S - N + Ni)\S\Nil(N - N i)! 


In conclusion, it is to be noted that the percentages of 0, 1, 
2 . , iVi spades among combinations formed by selecting 

a different card from each of N packs are the same as the percent¬ 
ages of 0, 1, 2, . . . , N i spades among combinations of N cards 
from a single pack. That these two problems are the same and 
have the same solution may be shown as followsIn selecting the 
first card to form a combination, the situation is obviously the 
same for both problems, since pack 1 of the former problem and 
the original pack in the new problem are both complete packs. 
In selecting the second card, the situation continues to be the 
same for both problems. For, in the former problem, the cards 
eligible for selection from pack 2 are, according to the terms 
of the problem, all the cards except the card that matches the 
one drawn from pack 1; in the present problem the cards eligible 
for selection are by necessity only the cards left in the pack after 
the first card has been selected. Similarly, in selecting the third, 
fourth, fifth, or Nth card, the situation is the same for both 
1 As Ni goes from 0 to N , this equation generates the series 


(p 2 ff) • • • (p 2 # - n + 1 ) r | 
OS)OS - 1) • - N + 1) Li 

. N(N - 1) 

+ 2! (p 


i | W(p^) 

1 + (p 2 S - N + 1) 

(Pi£)(Pi£ - 1) 

2 S ~ N -\r l)(p 2 $ — N + 2) 


+ • • 



A general hypergeometrical series is of the form 


i , (a)( & ) , + !)(&)(& + !) 

1 + W * + " 2T(C)(C + 1) 


z 2 + • * * 


The part of the former expression in brackets forms a hypergeometrical 
series in which x — 1, a — —N, b = — pi$, and c — p 2 $ — N + 1. 

* This is obtained from Eq. (3) by multiplying numerator and denomina¬ 
tor by (pi S iVi)!(p 2 $ — N + Ni)\(S — N)\(N — Ah)l 
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problems. Hence it follows that the two problems are identical 
in all respects and therefore have the same solution. 

In general, therefore, if all possible combinations of N cards 
are made from a deck of S cards, in which the percentage of spades 
is pi 'and the percentage of nonspades is p 2 , where pi + p 2 1, 



the relative frequencies of combinations containing N i spades 
are those given by Eq. (4). This conclusion is of importance m 

certain types of sampling problems. 1 

Character of the Hyper geometrical Distribution. It will be 
noted from Fig. 13 that the distribution of Table 11 is skewed. 
In general, the mean of the distribution is at Npi and the various 
moments about the mean are 2 , 

“On the Curves Which Are Most Suitable tor 
Describing the Frequency of Random Samples of a Population, Biometnka , 
Vol. 5 (1906-1907), pp. 172-175. 
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*-”**>( i-fcr 1 ) 

1*3 = iVpip 2 (p 2 - p,) ^1 - j\ - ~ 2 ^ j 


( 5 ) 


+ 3pip 2 (A r - 2) 1 



It is to be noted that the sign of is negative when 2 N > S or 

N /& > 2". 

These equations show that, if p, does not equal p 2 , the hyner- 
geometncal distribution is generally asymmetrical. If p „ ; s 

than n P !? SkeW6d P ° sitively; and ^ Pi is greater 

than p 2 , lt will be skewed negatively. If p 2 = Pl , the distribution 

t Wff Symm ® tr + lcaL The equations also show that the greater 
the difference between p 2 and Pl and hence the smaller the product 

Pl rr f 16 arger wl11 ,je tlle coefficient of kurtosis [i 2 = u 4 /« 2 
If Sis made indefinitely large relative to N, i.e., if the number 

n imhe 'f a ^ “ T** indefinitel y large relative to the 
umber of cards selected to form a combination, then p 1( S, 

nV ’ ;i«Y ’ X ~ N J a11 become Practically equivalent to 
P.S, similarly, _ p 2i S, p 2 S - 1, . . . , p 2 « _ N + Ni become 

practically equivalent to p 2 S, and S, S - 1, S — 2 

S — N + l become practically equivalent to S. Hence as 

ticX d the ndefinitel w la m reI f‘ Ve t0 N ’ Eq - (3) be comes prac- 
al!y the same as Eq. (1) and the hypergeometrical distribution 

1 educes to the asymmetrical binomial distribution (or to the 
symmetrical binomial distribution if Pl = p ,).* This k nkn 

shown by the fact that when 8 is increased relative to N the 
equations for ^ = tfl/tff and 0 2 = 0 f the hypergeometrical 
distribution (see Eq. 5) approach the equations for g 4 and 0 2 of 
the asymmetrical binomial distribution [Eqs. (2)]. Such a 

1 For (J 2 reduces to the form 


o 3r 1 
* = T + - 


6m — 6pip 2 r 

7t 


w here m, r, and t are constants depending only on N and S. 
very laxative ^ ^ h* 00 ™ 8 “ n0t ° nIy S but 
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relationship is logically to be expected; for if the size of each pack 
of cards is very large relative to the number of cards making up a 
combination, the probability of a spade in the pack is not much 
changed in the course of forming combinations. That is, the 
problem reduces to the one considered in the previous section. 
Viewed in this manner the binomial distribution is a special case 
of the hypergeometrical distribution. 

Hypergeometrical Distribution and the PearsonianSystem of 
Frequency Curves. The hypergeometrical distribution forms the 
basis of the Pearsonian system of frequency curves. Having 
noted that the normal curve had the same relative slope at various 
points as the symmetrical binomial distribution and that Pear¬ 
son's type III curve had the same relative slope at various points 
as the symmetrical binomial distribution, Karl Pearson raised the 
question: What curve has the same relative slope at various points 
as the hypergeometrical distribution? Such a curve, he con¬ 
tended, would be a generalized frequency curve; for in the deriva¬ 
tion of the hypergeometrical distribution no assumption was 
made as to the equality of the probabilities nor was any assump¬ 
tion made as to independence. 

Now it can be shown 1 that the relative slope of the hypergeo¬ 
metrical distribution (i.e., the relative slope of the frequency 
polygon that the distribution forms) is given for various mid¬ 
points, X — N i + by the expression 

Relative slope = t— y v o (6) 

b o t oiX + b 2 X 2 . N 

where a, bo, bi, and b 2 depend on the values of S, N, pi, and p 2 
of the hypergeometrical distribution. Equation (6), therefore, 
was taken by Pearson as a general equation capable of represent¬ 
ing any frequency distribution. 

The characteristics of a particular curve represented by this 
equation will depend on the values of a, b Q , b i, and b 2 . The 
parameter a is of significance in determining the position of the 
curve. For the slope of the curve is zero at its peak (assuming 
the curve to be a smooth one), and the mode of the curve is 
therefore given by X + a = 0. That is, the mode comes at 
X = —a. The other parameters determine the general shape 


1 See Appendix to this chapter (p. 79). 
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of the curve. Thus, if the denominator of Eq. (6) is factored 
into the product of two expressions, viz., 


-h + Vbi - 46„b 2 \ ( -W-s/bl- 46062 ' 

2b 2 / V 26 2 


it is seen that the form of the curve will depend on the value of 
b\ — 46 0 6 2 . This it will be noted, is the so-called f ‘ discriminant ?? 
of the equation b 0 + biX + b 2 X 2 = 0, since its sign determines 
the nature of its roots. The discriminant may also be written 

45o?> 2 -1^. Accordingly, Karl Pearson took b 2 /Abob 2 as 

the criterion of curve form. 1 On the basis of this criterion, he 
distinguished three main types of curves. If the criterion is 
negative, i.e., if the roots of the equation bo + biX + b 2 X 2 = 0 
are real and of opposite sign, 2 the curve belongs to the class of 
curves distinguished by Pearson as type I. If the criterion is 
positive but less than unity, i.e., if the roots of the equation 
b 0 + b iX + b 2 X 2 = 0 are imaginary, the curve belongs to the 
class of curves distinguished by Pearson as type IV. Finally, if 
the criterion is positive but greater than unity, i.e., if the roots of 
the equation bo + biX + b 2 X 2 = 0 are real and of the same sign, 


1 For the correct use of the Pearsonian criterion, X should be measured 
from the mean. 

2 Equation (6) is more general than the hypergeometrical distribution 
from which it was derived, for the constants of a hypergeometrical distribu¬ 
tion will never give rise to a type I curve (see Appendix to this chapter, 
p. 81). Having been suggested by the hypergeometrical distribution, 
Eq. (6) was taken by Pearson as a general frequency equation in which the 
constants could have any values whatsoever, whether or not they satisfied 
the requirements of a hypergeometrical distribution. Other considerations 
suggested the reasonableness of this. For example, if the causes of variation 
in X remained the same over the whole range they would generate a normal 
distribution, for which the relative slope would be —x/<r 2 . If, however, 
the causes of variation varied with X, then cr would become a function of X , 

and the relative slope might be written-If the function <x{X) is 

<x (-V ) 

expanded in a power series (Taylor’s expansion) in the neighborhood of the 
mean of X and the first three terms of this expansion are taken as a good 
approximation to the function, the equation for the relative slope becomes 
x 

— _|_ f } X 2 } which is essentially the same as Eq. (6). (Cf. Pearson, 

Karl, “Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und 
Pearson. A Rejoinder,” Biometrika, Vol. 4 (1905-1906), p. 204.) 
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the curve belongs to the class of curves distinguished by Pearson 
as type VI. * 



tiG. 14. Type IV frequency curve. Number exposed to risk of siekne 
according to Sutton’s sickness tables (males, all durations). 


160 



Fig. 15—Type I frequency curve. Number exposed to risk of sickness, 
{According to Watson, M.U. Tables, p. 19.) 


These are the three main classes of Pearsonian frequency 
curves. A type I curve is shown in Fig. 15. Curves of this 
type are limited in range. They are usually bellshaped, although 
skewed, but may under special circumstances be even U shaped, 


* See Elderton, op. cit., pp. 42-43. Figures 14, 15, and 16 are reproduced 
by permission of the author and publishers of this book, 2d ed. (1927) nn 
62,69,77. 
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J shaped, or twisted J shaped. Figure 14 shows a type IV curve. 
These curves are unlimited in range, bell shaped, and skewed. 
Finally, Fig. 16 shows a type VI curve. These are unlimited 
in range in one direction; they are usually bell shaped and 
skewed but may also be J shaped. 1 

Various “transitional’■’ types of curves have also been dis¬ 
tinguished depending on special values of the b’ s. For example, 



if b 2 = 0 and' the criterion is accordingly infinite, the result is 
Pearson's type III curve discussed above. 2 If both b 2 and b i are 
zero, the result is the normal frequency curve. A table of 
criterion values and curve types is given on page 135. 

General Significance of the Pearsonian System of Frequency 
Curves. Explanation of Nonnormal Distributions . The fore¬ 
going analysis is of considerable significance in explaining the 
conditions that give rise to various types of nonnormal frequency 
distributions. To illustrate this, consider once again the flour- 
bag experiment described in the previous chapter. 3 Suppose as 
before that the experimenter has a large number of bags, each 
weighing exactly 5 pounds. Suppose, however, that the amount 
added or subtracted from each bag is now dependent upon the 
throwing of 10 prisms, each of which has three faces marked 
with a T and one with an H. As previously, the experimenter 
adds an ounce of flour to a bag for each H that faces down and 
subtracts an ounce for each T that faces down. Under the 
assumptions that the prisms are thrown in a random fashion, 


1 See ibid., chart opposite p. 46. 

2 See pp. 47-50. 

3 Cf. pp. 36-37. Also see Smith and Duncan, op. at., pp. 292-293. 
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intuition suggests that a frequency distribution of the weights 
of a large number of bags will approximate the form of an asym¬ 
metrical binomial distribution for which Pl = t and p 2 - t- « 
would be like the relative frequency distribution presented m 

Table 12. 

Table 12.—Frequency Distribution of a Large Number of Bags of 
Flour of Specified Weights 
Weight of 


Bag 


Kola live 
Frequency 

59,049 
1,048,576 
196,830 
1,048,576 
295,245 
1,048,576 
262,440 
1,048,576 
153,090 
1,048,576 
61,236 
1,048,576 
17,010 
1,048,576 
3,240 
1,048,576 
405 

U)48,576 

_3 0 

1,048,576 

1 

1,048,576 

The general shape of this distribution can be represented by a 
Pearsonian type III curve. Owing in this particular case to the 
greater probability of subtracting an ounce than of adding an 
ounce, the mean weight of the bags will be less than the original 
“central value” of 5 pounds, and the distribution will be skewed 
positively. If the prisms had had more H than T faces, the 
probability of adding an ounce would have exceeded the prob¬ 
ability of subtracting an ounce, the mean weight would have been 
greater than 5 pounds, and the distribution of weights would have 

been skewed negatively. . _ . 

If instead of adding or subtracting flour m accordance with 
the number of H’s and T’s appearing on the throws of 10 prisms, 


41b. 

6 oz. 

41b. 

8 oz: 

41b. : 

L0 oz. 

41b. : 

12 oz. 

4 1b. 

14 oz. 

5 1b. 

0 oz. 

5 lb. 

2 oz. 

51b. 

4 oz. 

51b. 

6 oz. 

51b. 

8 oz. 

51b. 

10 oz. 
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the additions and subtractions had been made with reference 

Imont Z H a SPad6S ^ the nUmber 0f other card * fo»nd 

a ong 10 cards drawn without replacement from a pack of 52 
p aying cards, then the distribution of weights would have tended 
to conform to a hypergeometrical distribution and it could have 
een represented by one of the Pearsonian curves given by Eq 

£ h 1 S T ^ hand ’ the relative frequencies 

wdh which bags of 4 pounds 6 ounces, bags of 4 pounds 8 ounces, 
etc would have tended to occur would have been those given in 
table 11, and the general shape of the distribution would have 
been described by a Pearsonian curve of type IV.* Since the 
probability of adding the initial ounce (i.e., the probability of a 
spade on the first draw) is again less than the probability of sub- 
tracting the initial ounce (i.e., the probability of a nonspade 

nosi«v f 1St ’ the distribution of weights is again skewed 
positively and its mean is less than the central value of 5 pounds. 

TT u been a , dded When a nons P a de was drawn and 
subtracted when a spade was drawn, then the distribution of 

weights would have been skewed negatively and the mean weight 
wou d have been greater than 5 pounds. If the probability of 
dding the initial ounce had been equal to the probability of 
subtracting it, as would have been the case if an ounce were 
added whenever a black card appeared and subtracted whenever 
a red card appeared, then the distribution of weights would have 
been symmetrical about the normal value of 5 pounds but 
the kurtosis of the distribution would have been less thkn 3 

Stitts been iess peaked than a nWmal 

Thus, whenever the deviation of a variable from some central 

causa W ^ & fu k T ° f ^ COntribution s of a number of 
causal factors the absolute size of each contribution being the 

same, then if (1) the contributory causes are independent of each 

other, but the probability of a positive contribution is greater 

r ess than the probability of a negative contribution, or (2) the 

contributory causes are not independent of each other the 

lesultmg distribution of the variable will tend to be skewed in 

£d STf 10 ^ f 0r K Whlch the “ge Probability is the smaller 
and will tend to be more or less peaked than the normal curve. 

* See Appendix to this chapter (A 81) 

1 Cf. p. 56. P ' A 
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Although this conclusion as to the causes of skewness and 
kurtosis in homogeneous variates would appear to be generally 
applicable, one point needs to be further considered. The 
asymmetrical binomial distribution and the hypergeometrical 
distribution, upon which this conclusion was based, are both 
discrete distributions. The number of H's and the number of 
spades can vary only by integral units. The weights of the 
various bags of flour derived from either the binomial or the 
hypergeometrical distribution differed exactly by multiples of 
2 ounces; there were no bags of fractional weights. Although 
the foregoing conclusion would therefore appear to be a valid 
explanation of the distributions of discrete variates, such as the 
number of veins in a leaf, the number of petals on a flower, and 
the number of individuals in a litter, its validity as an explanation 
of the distributions of continuous variates, i.e. t of frequency 
curves proper, has yet to be established. This will now be 
considered. 

Difficulty in Explaining Nonnormal Curves. In the case of the 
symmetrical binomial distribution, it was assumed that if the 
number of contributory causes was made infinitely large and 
if each contribution was made infinitesimally small, the resulting 
fluctuations in the variate would become practically continuous. 
In that instance, it was found that the distribution of the con¬ 
tinuous variate was of the form of the normal frequency curve. 

If the same method of attaining continuity, however, is applied 
to the asymmetrical binomial and to the hypergeometrical dis¬ 
tributions, the analysis immediately runs into difficulties. For, 
as already pointed out, 1 if the number of prisms is made very 
large, the asymmetrical binomial distribution diminishes in skew¬ 
ness; and as N approaches infinity, the distribution approaches 
the symmetrical form. The same thing is true of the hyper¬ 
geometrical distribution. If the number of cards in a pack is 
made larger and larger, the proportion of spades always remaining 
the same, and if the number of cards making up a combination is 
also increased, the hypergeometrical distribution likewise 
diminishes in skewness and its coefficient of kurtosis approaches 
3.* The consequence is that, if continuity of a variate is the 

1 See p. 46. 

fx\ 

* This is shown by Eq. (5). For pi = ~ and if both S and N are 
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result of the number of contributory causes being infinitely 
large while their individual effects are infinitesimally small, the 
distribution of the variate is of the form of the normal frequency 
curve and the foregoing analysis fails to explain nonnormal 
curves. 

Karl Pearson recognized this difficulty in his analysis and 
sought to meet it as follows: That distributions of continuous 
biological and physical variates were nonnormal was a fact that 
had been demonstrated by empirical investigation. “ It is clear/’ 
argued Pearson, “that if such frequency curves . . . are to be 
treated as chance distributions at all, it would be idle to compare 
them to the limit of a symmetrical binomial. We are really 
quite ignorant as to the nature of the contributory ‘causes’ in 
biological, physical, or economic frequency curves. The con¬ 
tinuity of such frequency curves may depend upon other features 
than the magnitude of n. ... It may possibly be that con¬ 
tinuity in biological or physical frequency curves may arise from 
a limited number of ‘contributory causes’ with a power of frae- 
tionizing the result .” 1 For example, in the flour-bag experiment, 
the amount of flour added or subtracted in each case could be 
taken as a handful instead of an exact ounce. In such a case, 
the distribution of the weights of the bags of flour would tend to 
form a smooth curve even when the number of prisms used was 
small . 2 

Thus, in proceeding by argument from the discrete sym¬ 
metrical binomial to the continuous normal curve, the element 
of discreteness is blurred and finally obliterated by making the 
number of causes infinitely large and decreasing the contribution 
of each cause. The same argument applied to the asymmetrical 


increased, such terms as 


N - 1 


S - 1 


are little changed; but the N that appears 


in the denominator when juJ is divided by y 3 is uncompensated for, and this 
causes /3i to get smaller and smaller as N is increased. The same is true for 


j02 = —; for as N is increased, all terms in the ratio except 3 reduce to 0. 

M2 

1 “ Contributions to the Mathematical Theory of Evolution. II. Skew 
Variation in Homogeneous Material,” Philosophical Transactions of the 
Royal Society of London, Series A, Vol. 186 (1895), p. 358. 

2 Cf. Pearson, Karl, “Das Fehlergesetz und seine Verallgemeinerungen 
durch Fechner und Pearson. A Rejoinder,” Biometrika , Vol. 4 (1905- 
1906), p. 206. 


THE PE ARSON! AN SYSTEM OF FREQUENCY CURVES G5 

binomial or to the hypergeometric series will also blur the dis¬ 
creteness, but it will, in addition, cause the nonnormality to 
disappear. In order to retain the nonnormality, it seems neces¬ 
sary to keep the assumption of a small number of contributing 
causes; the blurring of the discreteness is accomplished by the 
assumption that the contributions of the contributing causes 
vary among themselves. 

In. the Pearsonian analysis of nonnormal frequency curves, 
then, it is necessary to suppose that nonnormal series in real life 
are created by the number of causes affecting variability being 
small and their contributions varying in magnitude. This sup¬ 
position is not unreasonable when applied, for example, to such 
things as the distribution of income, wage rates, and other 
similar economic phenomena that are known to have nonnormal 
static variability. In these and other economic and social 
phenomena the factors causing static variability may reasonably 
be supposed to be not only finite but also comparatively few in 
number. 

APPENDIX 

MATHEMATICAL PROOFS 

A. Proof That, for Any Binomial Distribution Represented by 
the Equation P(tfi) = P^Pf. the Mean = Np„ 

d = V^PiPs* 

fc _ (Piyia 2 , and @ 2 = 3 + [Since Eqs. (3) of 

^ NVl$2 N plp 2 

Chap. II are merely a special case of the foregoing for which 
p x = p 2 = the following is also a proof of them.] 

By definition the mean of any distribution of probability is 
equal to SP(X)X [Chap. 1, Eq. (2)]. Hence the mean of the 
binomial distribution is 

X P {Nl)Nl = XmC^‘ Ni 

where the summation is for N 1 from 0 to N . 

If JVi is canceled out of the numerator and out of Nil in the 
denominator, however, and if TV pi is factored out of the resulting 
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expression, the value of the mean becomes 

iVpiV_ (N — 1) ! p^i-ipAT 2 

ivpi Z/ (N x - l)!(iV,)! Pl P2 

Also, by definition N 2 = N - N h which may be written 
[N — 1 — (IV i — 3)], and the sum may thus be put in the form 

S (N — IV 

-^- Ll ___ r\ N i— l-nfiV— 1— (Ni— 1)] 

(iVi - 1)![iV - 1 - (Ni - l)]! Pl P2 

This is recognized, 1 however, to be the expansion of (p x + p 2 ) Ar_1 , 
which equals 1 since pi + p 2 = 1. Hence the mean of the 
binomial distribution reduces to IVpi. 

The second moment of the binomial distribution about the 
origin is, by definition, XP(Ni)Nl But N\ may be written 
N\ = JVi(iVi - 1) + Ni. Hence 

2P(IVi)fV 2 = 2P(N 1 )[N 1 (N 1 - 1) + N J 

= SP(iVi)JNTi(iVi - 1) + 2P(Ni)Ni. 

The second term of this has just been seen to be equal to IVpi 
and the first, when written in full, is 

2wWl! p ' ,pr!iVl(iVl ~ 1) 

If, as in the case of the mean, the terms Ni(Ni — 1) in the num¬ 
erator are canceled against the first two factors of IV V in the 
denominator and if the quantity N(N — l)p 2 is factored out of 
the resulting expression, the above becomes 

The quantity IV 2 , however, which equals N — N i, may be written 
[IV — 2 — (IVi — 2)]; and when this is done, the sum is recognized 
as the expansion of (pi + p 2 )^ _2 , which equals 1. Hence the sec¬ 
ond moment about the origin is equal to IV(IV — l)p 2 + IVpi, and 
the second moment about the mean is equal to this minus (IVpi) 2 
(by the short equation, d 2 = XP(X)(X 2 ) — X 2 ). Therefore, 

1 Cf. p. 43. 
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d 2 = N{N - l)p 2 + iV Pl - NYi = jV Pl - JVpi 

, J ^ = -^PK 1 - Pi) = iVpjpa, 
and d = v Vp.p. : . 

Again the third moment about the origin is equal to SP(A^i)A|. 
But N\ may be written 1 


m = NtiN! - 1 )(Ni - 2) + 3 Ni(Ni - 1) + N h 


so that 


2P()Vi)iV? = 2P(N 1 )N 1 (N l - 1) (A^i - 2) 

+ ZSPC/VOjV^jVj - 1) -f 2P(A'i)Ai. 

The second and third terms of this have already been proved to be 
equal to 3 N(N — l)p? and A r pi, respectively. Furthermore by 
canceling 1 Vi(IVi - l)^ - 2) out of the first term, and by 
factoring out N(N - 1)(N - 2)p 3 , the sum part of this term 
is reduced to the equivalent of ( Pl + p s )»-», which equals 1 
The whole expression for the third moment about the origin 
thus becomes v 3 = N(N — 1)(JV — 2)pf + SN(N — l)p 2 A r P i 
By making use of Eq (6), Chap. I, for converting moments 
about an arbitrary origin to moments about the mean, the third 

moment about the mean of the binomial distribution is found 
to be 


IP = N(N - 1)(N — 2)p? + SN(N 
= Afpi(2pf - 3pi + 1) = iVpiCl 
Finally, fh = vl/tfl has the value 


l)Pi + iVpi - 3iV 3 pf 
~ 3)V a p?(l - Pl ) + 2iV 3 pf 
Pi)(l - 2 Pl ) 

= Apip 2 (p 2 — pj) 


In precisely the same way the fourth moment about the 
mean can be shown to be equal to 

l«4 = 3(JVpip 2 ) 2 + tVpip 2 (l — 6pip 2 ) 

Since f3 2 = p 4 /it follows that 
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B. Proof That the Mode of a Binomial Distribution Lies 
between iVpi - p 2 and IVpi + Pi- If the ordinate of the binomial 
distribution at IVi is to be the modal ordinate (i.e., the highest) 
it must satisfy the following criterion: 

y Si -i s y k , a 

or 

iV!pi' 1 - 1 p£ r - v ’ +1 < AHpf pV* 1 

Tn7- 1)!(IV - N 1 + 1)1 N?.(N-N!)\ 

N !pS'‘ +1 p2 _Wl_1 

_ (IVi +T)1(JV - Nl - 1)1 

On taking out the common factor 

AMpr'p?-"'- 1 _ 

(IV i - 1)!(IV -IVi - I)! 

these inequalities become 

p! <_PiP 2 _> Pi 

(IV - IVi + 1)(IV - IVi) ~ Ni(N - IVi) " (IV1 + 1)(IVi) 

From the first inequality it follows that 

JVip 2 ^ (N — Ni + l)pi 

or that JVi(pi + P2) ^ iVpi + pi, or, since Pi + p 2 = 1, that 
JVi ^ iVpi + pi. From the second inequality it follows that 
p 2 (JVi + 1) > pi(iV - Ni) or that Nifa + p 2 ) + P2 ^ iVpi or 
that Ni ^ JVpi - P 2 . Hence, in summary, if Ni is to be the 
mode it must be the integer lying between iVpi - p 2 andiVpi + pi. 

C. Proof That the Binomial Distribution Is Approximated by 
the Normal Curve. The general equation for the binomial dis¬ 
tribution is 

p(JVi) = mk \ pf,p " (1) 

where N 2 = N - Ni and pi + p 2 = 1 (c/. page 43). The 
equation for the symmetrical binomial distribution (cf . page 33) 
is the special case for which pi = P2 = i- The following analysis 
shows that the general binomial equation can be approximately 
represented by the equation for the normal curve, the degree of 
approximation depending on the size of N . The analysis thus 
includes the special case of the symmetrical binomial distribution 
as well as the more general asymmetrical binomial distribution. 
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If the variable N i is replaced by x *= Ni - N pi, that is, by 
the deviation of Ni from its mean, the general equation (1) 
becomes 

«*> - + Ap. - «i ' <2> 

This gives the probability of x cases more or less than the mean 
number. The mean 1 of this distribution of x is, of cou rse, 0 , 
but its standard deviation 2 is the same as that of N i, piz., a/ iVpip 2 » 

its ?1 is likewise and its & = 3 + 1 JVptpP' 

For simplicity assume that iVpi and N$ 2 , and hence x, are 
integers. This assumption does not materially affect the results 
since the error involved is of order l/N and the subsequent 
argument will assume that N is so large that terms of order l/N 
or higher order may reasonably be neglected. In other words, 
this assumption is good enough for the degree of approximation 
given by the final equation. 

Expression (2) may be written 


where 


P(x) = P( 0) 

P(0) 


(iVpi)!(JVp 2 )! 


(iVpi + x) !(A r p 2 - x )! 

AT! 

v ' 


PiP^* 


(3) 


_ t\N pi 

(7^pi)!(iVp2)! Pl 2 


or the value of P(x) when x = 0, its mean value. To Eq. (3) 

1 Since x = Ni - ATpi, the mean of x is 

XP(x)x = ZP(Ni)Ni - ApiSP(Ai). 

But SP(A r i)Ai = Nvu the mean of N i (cf. p. 66), and SP(ATi) = 1 
Hence the mean of x is JVpi — iVpi — 6- 

2 Since the mean of x is zero, the variance of x, that is, the square of its 

standard deviation, equals 

Xx 2 P(x) = 2 (Ah - A/pi) 2 P(ATi) 

- zNlPiNS -2N^xNiP(NC AN*tfzP(Ni). 

But XNiP(Ni) is the mean of Ni and equals ATpi. Also, SP(Ai) •= 1. 
Hence zA^P(Ai) - 2Api2AiP(Ah) + N^txP(Nx) = tN\P(Ni) - N *pfi 
which by the short equation (see p. 67) is the variance of Ah. Therefore 
the standard deviation of x is the same as that of Ah. 

A similar argument can be used to show that the third and fourth moments 
of x about its mean are the same as the third and fourth moments of Ah 
about its mean and hence that it has the same (?i and |3 2 . 
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Stirling's approximation for factorials may be applied. This is 1 
a\ = a a e~ a \Z2im 

which is correct to 1/a or as good an approximation as is being 
sought. Making use of this, Eq. (3) becomes 

= _ (A^pQwie-^' -y/2«Vpi \/2vN^ pft;-* 

■PC 0 ) (JVpi + x ) N n+*e- N »'-* -v/^CiVp! + x) ' 

(iVp 2 — x )wp.-* e -* p ,+* -y/MiVpa - x) 

which reduces to 2 

P(x) = _ 1 

P(0) / x / “ \ N j> 2 - x+ i. (4) 

\ + Wi) v ~ Np 2 ) 

A final step is to take logarithms and then expand these in a 
power series. 3 Thus, 

1 More exactly Stirling’s formula says that, as a approaches infinity, 
a a e~ a \Z2 ttcl < a\ < a a e~ a V^m ^1 + 

Accordingly, 1 + ± gives an estimate of the degree of accuracy of the 

approximation. This formula may be demonstrated briefly by evaluating 
the area under the curve y = log x between ordinates x = 1 and x = n 
which gives, by integration, ’ 

Area = log x dx = x log x — = n log n — n + 1 

An approximate estimate of this same area is obtained, however by the 
trapezmd formula [of. R. Courant, Differential and Integral Calculus (1940) 
Vol. I, p. 3431, erecting ordinates at x = 1, x = 2, x = 3 • . . x =* n 
This approximate estimate gives the 7 ' 

Area = i log 1 + log 2 + log 3 + ■ • • + log(n - 1) + $ log n 

- log nl — \ log n, since log 1=0. 

Consequently, log »! - | log n = n log n - n + 1, and lienee 
log nl = (n + 2 ) 1 °® n ~ n + 1; 

so that nl -. 2.71V?*, which is very close to Stirling’s 

formula, n n e n \^ 2 ir\ / n t For a more precise derivation of Stirling’s for¬ 
mula, see ibid., Vol. I, pp. 361-364. 

^ ividin g both numerator and denominator by 
1 Pl . * 2 * +s and canceling out like terms in numerator and 

denominator. Note that pi + p 2 = 1. 

3 The value of log 6 (l + a) is given approximately by 
log, (l.+ a) 
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lQ ge Jg] = - (tfPl + * + J) ( ' + A^) 

-(vp 2 -z + i) l° 8 *(l“ife) 

= - (Npx+ x +i)[w^~\{w^) +3 {Wi) ~ " 

- (iVp 2 - 2 + 5 ) [ - ^ - 2 Qyj - 1 fe) “ • • 

Upon multiplying out and arranging in ascending powers of 1/N 
this becomes 

P(x) [re 2 + z(p 2 - Pi)] , 2ff 3 (pi ~ Pi) + 3x 2 (p l + Pi) 

l0ge p(d) “ “.“2Fpip2. : 12A r2 pipi .■ 

+ terms of order A ra 3 —3 and terms of higher order (5) 
iv P 1 P 2 

Now the equation for the standard deviation of x, viz., 

6 = 'sj A r plp2) 

shows that variation in x is of order \/N. In other words, 
as N increases, the variation in re increases as \/N. Hence, 
terms such as z/iVpip 2 are of order \/N/N or l/\/JV and terms 


for — 1 < a 5= 1. 

To demonstrate this, note that by the integral calculus 

f d 

log, (1 + a) = J o 

If the division is carried out to n terms and the integration carried 

1 "r t 

out term by term, it follows that 

log, (1+a) - a -\ + \ --+••-+ (“ 1 ) n_1 ^+ (-1)” f Q rqr t dt 

For a ^ 0, the integral term is less than a n+l / ( n + 1) which approaches 0 as 
n approaches 00 . If —1 < ci ia 0, the absolute value of the integral term 

is less than or equal to Courant, op. tit., Vol. I, 

pp. 315-317.) Since the standard deviation of x equals \^Npip i7 it follows 
that z is of order Vn. Hence and are of order 1 /V~N, and if N 

is taken large enough, |^|/iVpi and \x\/Npz will for some value of N 
become less than jl| for the important part of the distribution lying between 

- = — 3 and - = 3, say. 

0 o 
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such as x 2 /N*p\pl are of order N/N 2 or 1 /N. If in Eq. (5) terms 
of order l/N and those of higher order are neglected, then 


VP1P2 ^ p2 Pl ^ + 6 A^pfpl ^ 


6e P( 0) 2JVpip 2 2A^ 2 “ Pl ' + 6A^Wl 

or since p| — p? = (p 2 — pi)(p 2 + Pi) and p 2 + pi = 1 
_ £ 2 ( x x s \ r»o - 


‘ U6 ' p (°) 2jVp!p 2 Va'PiPs : w ¥?p ?; P? ‘P 2 
On replacing iVpjp 2 by etc., this becomes 

log-. UAl = _£!_ _ (x ^ p 2 - pj 

g p( o) a»» irw/ir 

or, on taking antilogarithms, 

/ Pa —pA /x a’3 \ 

P(x) = P(0)e 2 & e ^ 26 /\d 3dV ^ 

To complete this equation, P(0) must be evaluated. This 
may be done as follows: By definition, P(0) equals 


( X __ X s \ 

d 3d 3 / 


P 2 ~ pi 
2d 


N! 

Wp?W! pfPlprP ’ 


Using Stirling's formula, 

Pl'Q) ^_ N N e~ N \/%rN pfPip^« 

{N^i) N ^e~ N ^ v / 2?rA r pi (A r p 2 )^~^P2 ^/%rNp 2 
which reduces to 

B(0) = —=L= = 

v 27 riVpip 2 d\/ 27 r 
all within an error of order l/N or 1/d 2 . 

Equation (6) may thus be rewritten 

i _£i P 2 -P 1 (x a-n 

P{x) = -— e - A 'c \<s 3 d’J / 7 \ 

4\/2* • U> 

This gives the value of P{x) within a margin of error of order 
l/d 2 or 1/iV^ If N is sufficiently largo so that terms of order 
V* or 1/ s/N can reasonably be neglected, Eq. (7) reduces to 

P(X) ~7UTC~^ ^ 

This is the equation for the normal distribution. 
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It will be noted that, when pi = p 2 , Eqs. (7) and (8) become 
identical. The normal curve is thus a better approximation to 
the symmetrical than to the asymmetrical binomial distribution, 
as common sense suggests. Equation (8) however, will also 
give results very close to those of (7) wherever the difference 
p 2 — pi is sufficiently small relative to d. That is, if pi and p 2 
differ little absolutely and hence the binomial distribution is only 
slightly skewed even for small values of N, or if N is so large that 

——— = .— is small, then the normal distribution becomes 

d V'tfpipsi 

a satisfactory “first” approximation even to the asymmetrical 
binomial distribution. But if N is not very large and if pi differs 
radically from p 2 , then the binomial distribution is rather skewed 
and the second approximation (7) had better be used. Of course, 
if N is very small, then it is preferable to use the binomial formula 
itself, for in this case neglect of terms of order 1/N may lead to 
serious error. 

Equation (8) shows that the ordinates of the binomial distribu¬ 
tion may be approximated by the ordinates of the normal curve. 
It will be noted that, as N increases, the ordinates of the normal 
curve, as well as the ordinates of the binomial distribution (c/. 
page 34), tend to get smaller and smaller and the curve becomes 
more spread out. This is because N enters into the formula for 
d (that is, d = V^piPa)* The larger the value of N, therefore, 
the larger the standard deviation and, according to Eq. (8), the 
smaller any particular ordinate (for d enters into the denominator 
of this equation). If, however, the scales on which the curve 
is graphed are varied in proportion to d, that is, if the vertical 
scale is lengthened and the horizontal scale is shortened in 
proportion to d (that is, \/N), then the normal curve retains a 
constant shape, namely, that of the standard normal curve, 



where z = x/ti. This is the basis for the statement on page 34 
of the text, that if the scales are adjusted in this way then the 
limit of the binomial distribution as N is increased is the standard 
normal curve. 

The above shows that a binomial ordinate can be estimated 
by an ordinate of the normal curve and that a sum of binomial 
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ordinates can be estimated from the sum of certain ordinates of 
the normal curve. What is done in practice, however, is to 
estimate a sum of binomial ordinates over a given range from 
the area under the normal curve for that range, and it remains 
to be shown that an area under the normal curve is the approxi¬ 
mate equivalent of a sum of normal curve ordinates. This is 
demonstrated as follows: 

An ordinate of a normal curve 


V 


1 

__ P 2d 2 

dVS 


1 INC 

can always be represented by a rectangle of height —— e 262 and 

\/2ir 

a base 1/d. Furthermore, if the x-scale is measured in standard 
deviation units, a succession of ordinates will be 1/d units apart 
and the succession of rectangles representing them will all touch 
each other or, in other words, be contiguous. As N is increased 
and 1/d = l/-\/iVp]p 2 is decreased, rectangles over any range 
will be thinner and their area will approach the area of the stand¬ 
ard normal curve for that range. Hence the area under the 
standard normal curve can be used as an estimate of the sum of 
a series of normal ordinates and thus of the corresponding series 
of binomial ordinates. 

D. Proof That the Relative Slope of the Symmetrical Binomial 
Polygon Is the Same at Any Abscissa Mid-point as That of a 
Normal Curve with the Same Mean as the Binomial Distribution 

AT +1 

and a Variance Equal to —~~ Times the Binomial Variance . 1 

The ordinate of the symmetrical binomial distribution at any 
abscissa point N i is 


_ N\ AY 

VNl Ni\(N - Ni)!\2/ 


and the ordinate at the next abscissa point AT + 1 is 


_ n\ AY 

VNl+1 (AT + 1 )\(N - AT - 1)! \2/ 

1 This and the next two proofs are based upon Karl Pearson’s analysis in 
the Philosophical Transactions of the Royal Society of London , Series A, Vol. 
180 (1890), pp. 355 ff. 





\ - l)f (iVi + 1 N - N\) 


THE PE ARSON I AN SYSTEM OF FREQUENCY CURVES 75 

The difference between these ordinates is 

Nl 

A y Nl = Vn 1+ i - 2/wi ” n\\(n - ]Vi 

and since the abscissa interval is Ni + 1 - Ni = 1, the absolute 
slope of the side of the polygon joining the tops of these two 
ordinates is this difference A y Nl (see Fig. 3, page 4). . ^ 

The ordinate of the polygon at the abscissa mid-point Ni -f j 
is the average of the two ordinates at the abscissa points AT and 
jVj + l. Hence the ordinate at this mid-point is 

Nl 


2 (Vni+i + V* i) ~ 2Ni\{N 


( 1 +_ 1 _^ 

ivT-ryiViVi + 1.N-N1J 


The relative slope at the abscissa mid-point Ni + i is defined 
as the ratio of the absolute slope to the ordinate at that mid¬ 
point. Hence the relative slope of the polygon at the abscissa 
mid-point N 1 + i is 


A y Nl 

VNi+% 


V#i+i 

i(yx,+i + y« 1 ) 


y.v\ _ N — Ni Ni ~ 1 
•i(A" — A'i + Ni + 1) 


N — 2iVi — 1_ 2 (Ni - N/2 + j) 

iuV-rl! 


UN + 1) 


If x is set equal to iVi + i - f that is, to N l + * minus the mean 

of Ni, and if fc 2 is set equal to this expression for the 

relative slope at N 1 + i becomes 

= -2a; 

y* n+* 2 & 2 

s l - 

The relative slope of the normal curve, y = -—j==e 2<r2 , at 


any abscissa point x — X — X is 
d log y 


1 dy 
y dx 


dx 


d (-^r 2 ■“ V 2 *-) 

dx 


2x 
2(7 2 


Hence - ~ is the relative slope of a normal curve at the point 
’ 2 k 2 


x whose standard deviation = k = j- • Since 


the 
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st ^^ rd deviation of the symmetrical binomial distribution is 
V^V/4, the normal curve that has the same relative slope as the 
polygon of a symmetrical binomial distribution at any abscissa 
mid-point is the one that has the same mean as the mean of the 

binomial distribution and a variance equal to times the 

variance of the binomial distribution. If N is large, the two 
variances are practically the same, for then is practically 

equal to 1. 

E. Proof That the Relative Slope of the General Binomial 
Polygon Is the Same at Any Abscissa Mid-point as That of a 
Pearsonian Type III Curve. The ordinate of the general bino¬ 
mial distribution at any abscissa point AT is 


and the ordinate at the abscissa point AT -j- 1 is 


__ i\n 

Vn ' +1 ~ T 

The difference between these ordinates is 


Pi Vl+ 1 pf ~ jVl “ 1 


Vni+i y?vi 


jytp.WpA^- 1 / pi pa \ 

Ni\(N - AT- 1)! \AT + 1 N - AT/ 


and since the distance between the abscissa points AT + 1 and 
AT is 1, the absolute slope of the side of the polygon joining the 
tops of these two ordinates is A y Nl . 

The ordinate of the polygon at the abscissa mid-point AT + i 
is the average of the two ordinates at the abscissa points AT and 
AT + 1, that is, 

(_L P 2 \ 
VAf i + l^ 

The relative slope at the abscissa mid-point AT + i is the ratio 
of the absolute slope to the ordinate at that mid-point. This 
has the value 
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Ay Xl _ y Xl+ 1 - y». „ 2[(i V - iVi) P i - (W + l)pj : 

2/-vi+i + j/wi) ~ *0p. (-V ~f ^P 2 

=: 2(iVpi - N 1 - p 2 ) 

iVpl + (p2 — Pl)2Vl + p2 

which may be put in terms of the abscissa mid-point Ni + i by 
adding and subtracting ^ to the numerator and adding and sub - 
tracting ^(p 2 — pi) in the denominator, as follows. 


2[JVpi - (Ni + i) ~ P 2 + £1...) . 

iVpl + (p2 — pi) (iVi + i) + P2 — i(p2 — Pi) 

If now x is set equal to (JVi + i) ~ Npi + p 2 ~ i, that is, 
to the deviation of JVi + i from what is practically the mode 
of the binomial distribution (see page 68), the foregoing expres¬ 
sion for the relative slope can be put in the form 

— 2.r 

Relative slope = jV Pl + (p 2 - Pl )(z + JVj^ - p, + *) 

+ P2 — i(p2 — Pi) 

which, upon making use of the fact that Pi + p 2 = 1, reduces to 

— 2x ___ 

Relative slope = 2jV P ip 2 + 2plpl+~(ps - pi)* 

2 

P;-_P r _ 

7 2pip 2 (iV + 1) _j_ x 

p2 “ Pi 
— kx 
a -x 

where 1 


k - 



and 


a 


2pip 2 (N + 1) 
p2 - Pi 


To find the curve whose relative slope at the point x is that of 
the binomial polygon, it is necessary merely to integrate. This 
is done as follows: 


1 The k and a used in this proof are the same as the boldface k and a used 
in the bodv of the text. Italic Jc and a are used here to simplify the printing 
of the mathematical derivation; they do not signify sample values but repre¬ 
sent population values just as k and a do in the text. 
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Set 

1 dy = -kx 
ydx a + x 

But 

1 dy _ c? log y 
y dx dx 

and, by division, 

~kx ___ ka 

a + x 

Hence, 

d log y _ ^ , ka 

dx + a + x 

and, upon integrating, 


log* y = — kx + ka log e (a + x) + log e c 

Therefore, 

( \ ka 

1 + | J e -** 

where c is the constant of integration and c' = ca ha . 

The value of c' is determined from the condition that the area 
under the curve must equal 1. In doing this, it is to be noted 
that negative frequencies are inadmissible so that x cannot be 
less than -a. It is therefore the area under that part of the curve 

from x = —a to co that is to be equal to 1, that is, J ' y dx = 1 

To carry out this integration, multiply y by which 

( ka) ka ’ 

of course, equals 1, which puts the integral in the form 

f °° f 00 e ka 

J-a ydX= J-a Ikd^ lk{a + x)r»e~^ dx 

If 2 : is set equal to k(a + x), this becomes 


L vdx= (^IJ^^ 

(Note that dz = k dx). But z ha e~ s dz is by definition r (ka + 1), 
called gamma of ka + 1, and equals (ka )! * Consequently, since 

* Integration of z m ~ x e~ z dz by parts gives 
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f °° y dx must equal 1, 

, _ (ka) ka+1 
c ae ka (ka )! 

Since y = c' when x = 0 and since x — 0 is the mode of the dis¬ 
tribution (for dy/dx = 0 when a* = 0) c' is designated as y o and 
equals the height of the curve at the mode. Therefore, the 
formula for the curve is that given in the text (page 49), viz., 

y = y Q ^1 + e~ kx , the origin being the mode of the distribu¬ 
tion. This is Pearson’s type III curve. Taking logarithms of 
both sides yields 

logio y = logio 2/o + ka logic (1 ^ x ^°§ 10 e ’ 

which is the form given on page 49. 

F. Proof That the Relative Slope of the Hypergeometrical 
Polygon Is Given at Any Abscissa Mid-point X = N i + i by a 

Formula of the Type Relative Slope = ^ biX + bzX 2 
Eq. (4) of Chap. IV (see page 54) the ordinate of the hypergeo¬ 
metrical distribution at any abscissa, point JVi is 
(ViS)\(v2S)\(S - N)\N\ 


2/a : x ~ : 


(pT-S -TVOKpsS -N + NJ'.SWJCN — iVi)! 


z »-2 e -. 


L 


e z dz 


* + (w - 1) Z m - 2 e~ z dz. 

But the first term is zero both for 3 = 0 and s = , and therefore 

z m ~ l e~ z dz ~ (m — 1) J o 
If m is a positive integer, repetition gives 

/ 00 z m-I e -z dz = (m — l)(m — 2) 

Jo 

or since e~ z dz — —-e - * — 1 

/ dz = (m — 1)! 

Jo 

When m is not a positive integer, // z m-i e -e dz is taken as the definition of 

(m - 1)! The function z™" 1 #-* dz is called the gamma function of m 
and is written T(m). Thus r(w) - (m - 1)!, T(m + 1) = ml, etc. (see 
p. 254). 
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and its ordinate at the abscissa point Ni + 1 is 

.. „_(piS)!(p»S)I(S - N)m 

(piS - N 1-1) !(p 2 S - N + Ni + 1) !<S!(JVi + 1)! 


The difference between these two ordinates is 


(JV — iVi — 1)! 


?/,y 1+ l Vni 

= _(piS) !(pjS) !(<S - N)\N\ 

(pi<S - N i - 1) !( P 2<S - N + NJWNiKN - N i - 1)! 

"_ 1 1 

_(P iS — N + Ni + l)(Ni + 1) (pjS - Ni)(N - Ni) _ 

Since the distance between the abscissa points is 1, this difference 


the two ordinates. 

The ordinate at the abscissa mid-point Ni+i is equal to 
i(y n,+i + yx,); which equals 

(piS)l(pyS)!(S - IV) UV! 

2(pi<S - IV, + l)!(p 2 S - N + Ni)\SWil(N - N l - 1)1 

r_ 1 i 1 i 

[(P2-S -N + Ni + l){Ni + 1) (piS - Ni)(N - 

and the relative slope is the ratio of the absolute slope to the 
value of this mid-ordinate. That is, 

’ 2[(pi£ - NiXN - Ni) - (p 2 S 

Relative slope = ^ ~ ft +/* + 1)(jy ‘ + ^ 

(Pi s - Ni) (N - Ni) + (p 2 S - N 

+ iVi + l)(iVi + 1) 

which reduces to 

Relative slope = 2[iV -f NpiS — p 2 # — 1 — Ni(S + 2)] 

slope ^ + V2S + j _ N + ^ l[(p2 _ Vl)S - 2 N 

+ 21 + 2 N\ 

If now X is set equal to Ni + i, the foregoing expression for 
the relative slope may be put in the form 


Relative slope 


Y | 1 (2V H- 1)(1 H- pi/S) 

■*“ |_2 S + 2 

fpi S + H1 , [(p 2 - Vi)S - 2X]X 
4(/S + 2) 2 (8 + 2) 


which is equivalent to 


X 2 

S + 2 
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Relative slope = 7-.^ 

^ £>o + 0 iX + 02X 

where 

1 (N + 1)(1 + Pl S) 

° ~ 2 S + 2 

, _ 2A 7 'piaS + S + 1 
60 “ 40ST2) 

h _ (P 2 ~ Pi)3 - 2iV 
01 2(5 + 2) 

, _ 1 

62 5 + 2 

G. The Criterion for the Hypergeometrical Distribution. The 

character of the hypergeometrical distribution will depend on 
the value of the discriminant b\ — 46 0 &2. From the results of 
the previous section, this is seen to be 


[(p 2 - Vi)S - 2 NY ~ 4(2iVpi/S + S + 1) 
4(5 + 2) 2 


The value of the discriminant is thus related to the values of 
pi, p 2 , Nj and S. It is positive if N/S lies outside the limits 


\( p] + i) ( p -+ i) } and it is negative if N/S lies within 


these limits. In the first case, the hypergeometrical distribution 
is approximated by a type VI curve (see pages 58, 135); in the 
second case, by a type IV curve. 

In the example employed in the text, pi = i, p 2 = I, S = 52, 
and N — 10. Hence, N/S ~ -g-f • On the other hand, 



Since N/S{ = -M) lies within these limits, the hypergeometrical 
distribution given by these values of pi, p 2 , S, and N is approxi¬ 
mated by a type IV curve. 

It will be noted that.IV; pi, and S are all positive quantities so 
that the second term of the discriminant (without the minus 
sign) must be positive. That is, the roots of the equation 
to + bix + l) 2 X 2 — 0 cannot be real and of opposite sign. Con¬ 
sequently, Pearsonian curves of type I cannot be derived from a 
hypergeometrical distribution (see footnote to page 58). 








CHAPTER V 

THE GRAM-CHARLIER SYSTEM OF FREQUENCY CURVES 


The introduction of a possible variation in the contribution 
of a causal factor at the end rather than at the beginning of the 
argument may be considered a weak point of the Pearsonian 
analysis. It is therefore of interest to consider another approach 
to the theory of frequency curves that assumes variable contri¬ 
butions from the start. Such an approach is that which gives 
rise to the Gram-Charlier system of frequency curves. 1 

Derivation of the Gram-Charlier Formula. The assumptions 
on which the Gram-Charlier system of frequency curves is based 
are similar in many respects to those from which the skew 
binomial and Pearson’s type III curve were derived. The first 
fundamental assumption is that the fluctuations in a given 
variable X are the algebraic sum of the contributions of a number 
of causal factors. Thus the deviations of X from some central 
value are assumed to be equal to 61 + « 2 -b • . • + e N where the 
e’s may take on either positive or negative values. The signifi¬ 
cance of this assumption is that the contributions of the various 
causes are additive rather than multiplicative or related in some 
more complex way. The second principal assumption is that 
the contribution of each cause is independent of the contributions 
of other causes. 

The two assumptions above are essentially the same as those 
made by Pearson in the derivation of his type III curve; but a 
third fundamental assumption differs from Pearson’s. In the 
case of the binomial distribution it was assumed that each con- 

1 So called because J. P. Gram and C. V. L. Charlier were principally 
responsible for its development. See, for example, J. P. Gram, Om Raek- 
keudviklinger (Copenhagen, 1879) (Doctor^dissertation); and “fiber die 
Entwickelung reeller Fanctionen in Reihen mittelst dir Methode der klein- 
sten Quadrate,” Journal fur die reine und angewandte Mathematik , Vol. 94 
(1883), pp. 41-73. Also see C. V. L. Charlier, “fiber das Fehlergesetz,” 
Arkivfor Matematik, Astronomi och Fysik , Vol. 2, No. 8 (1905), pp. 1-9; and 
“fiber die Darstellung willkurlicher Functionen,” Arkiv for Matematik , 
Astronomi och Fysik , Vol. 2, No. 20 (1905), pp, 1—35. 
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tributory cause could add or subtract only a fixed amount to the 
variable (only an H or a T was possible). Under the Gram- 
Charlier system, it is assumed that the contributions of each 
causal factor can take on various values, the probability of any 
given value being given by a distribution the form of which is 
fixed, but not necessarily known. These distributions of prob¬ 
ability are allowed to vary from cause to cause; they may be 
discrete or continuous and are not subject to any restrictions 
other than that the probability of very large positive or negative 
contributions shall be practically zero, i,e., that the distributions 
taper off to zero in both directions. 

On the basis of these assumptions the Gram-Charlier analysis 1 
shows that, if the number of contributory causes is relatively 
large, then the distribution of the resultant variable X bears 
certain approximate relationships to the distributions of the 
individual contributory elements e. In setting up these rela¬ 
tionships it makes use of certain quantities, called “cumulants” 
or “semi-invariants,” that are directly related to the moments 
of a distribution. Thus, if *ii, V 2 > Ws> and m are the first four 
moments of a distribution about its mean, then ki = yi, k 2 = * 12 , 
k 3 = y 3 , and k 4 = — 3yl, where the k\s are the first four 

cumulants of the distribution. Both k 4 and yi are 0, of course, 
when the moments are measured about the mean. It is obvious 
that k 2 = <i 2 , that k 3 is related to the skewness of the distribution, 
and that k 4 is related to its kurtosis. When k 3 is 0, the distribu¬ 
tion is symmetrical; when k 4 is 0, its kurtosis is 3. 

By adopting certain mathematical approximations, the Gram- 
Charlier analysis shows that the cumularits of the distribution 
of X are the sum of the cumularits of the distributions of the 
individual contributory elements. That is, if ki 2 is the second 
eumulant of the distribution of the first contributory element 
€ 4 , k 22 the second eumulant of the second contributory element 
e 2 , k 23 the third eumulant of the second contributory element, 
etc., and if K 2 , K 3 , and K 4 are the cumulants of the distribution 
of X, then 

K 2 = ki 2 4- k 22 4- k 32 -f- * • * 4“ k iV2 

K 3 = k i3 4- k 23 4- k 33 + • • • 4- k iV3 

K 4 = ki4 4" k 24 4* k 34 -{- • • • 4~ k,v 4 

1 For a fuller discussion of the Gram-Charlier analysis see the Appendix, 
pp. 92-99. 
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These relationships form the basis of the Gram-Charlier 
analysis. Suppose, it is argued, that the variance K 2 of the 
resultant variable X is finite, as it is in most practical cases, and 
suppose that the variances, k 2 ’s, of the individual contributory 
elements are all of about the same order of magnitude. Then, 
since there are N contributory elements, the variance k 2 of each 
of them is of about the size of K 2 /N, or, as the mathematicians 
say, of the order of 1/N, and its standard deviation v / f £ 2 is of 
the order of l/\/N. This means that the average variation in 
each contributory element is of order l/\/N. Hence the third 
and fourth moments (and therefore the third and fourth cumu- 
lants), which involve the third and fourth powers of this varia¬ 
tion, will be of order 1/N* and 1/N 2 , respectively. Conse¬ 
quently, the third and fourth cumulants of X, K 3 and K 4 , which 
are equal, respectively, to the sum of the third and fourth cumu¬ 
lants of the N contributory elements, will be of order 1 /\/~N 
and 1 /N. 

It therefore follows that, if N is large, the third and fourth 
cumulants of X will be approximately 0. This means that the 
distribution of X will be practically symmetrical and its kurtosis 
will be close to 3. The Gram-Charlier analysis shows further 
that under these circumstances the form of the distribution of X 
will be that of the normal curve. 

In summary, then, the Gram-Charlier analysis shows that if 
variation in a quantity X is the sum of a large number of ele¬ 
mentary independent variations e, and if these elementary vari¬ 
ations are all of about the same order of magnitude, then the 
distribution of X will be approximately normal in form. 

If N is not large enough to make terms of order 1 /\/W or 
1/N negligible but is still large enough to make terms of higher 
order (such as terms of order 1/iV 2 ) practically zero, then the 
Gram-Charlier analysis shows that the form of the distribution 
of X will be given approximately by 


y — -__ 

d \/2tt 
where 


1 + 


/Sx _ x 3 \ 




(jx 2 x^ 

¥ + ¥ 


X - X - X, d 2 = K 2 , 


J £ 3 ^ _ 
3! 3!’ 


and 


At 

2d2 


K‘ __ ll4 — 3y 2 
4! ~ 4] 
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This is the more general Gram-Charlicr formula for a frequency 
curve. The normal curve is the special case for which A and B 
both equal 0; all other curves are nonnormal. 

Components of a Gram-Charlier Curve. The effect of the 
additional A and B terms on the shape of the distribution of 
X is illustrated in Fig. 17. Here the ordinary normal curve is 
represented by curve I. If A is negative, the effect of the A 



Fig. 17.—Components of a Gram-Charlicr frequency curve. 

term is to subtract from the normal ordinate for positive values of 
——rJ— an increasing percentage of that ordinate up to - = 1 

and then a decreasing percentage up to —-— = a/ 3 and there¬ 
after to add a small percentage. The opposite is true for nega- 
tive values of —-The fluctuations, in the amounts added 


and subtracted are illustrated by curve II of Fig. 17, for which 
A/a 3 = — .2. Since, when A is negative, the effect of the A 
term is to subtract from the right of the mean and to add tq the 
left of the mean (at least in its immediate neighborhood), the 
result is a transformation of the normal curve into a positively 
skewed curve (cf. Fig. 18). If A is positive, the effect of the A 
term is to add to the right of the mean and to subtract from the 
left (cf. Fig. 21), at least within the immediate neighborhood, and 
thus to transform the normal curve into a negatively skewed 
curve (cf. Fig. 22). 
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This dependence of the skewness of the distribution of X on 
the sign of A was to be expected. For, as indicated above, 
A = — K 3 /3! and K 3 itself is a sum of the cumulants ki 3 , k 23 , 



Fig. 18.—Combination of curves I and II shown in Fig. 17. 


etc., which are the third moments of the distributions of the 
individual contributions. Consequently, if the distributions of 
the individual contributions are all positively skewed, their 



Fig. 19.-— Combination of curves I and III shown in Fig. 17. 

third moments, and hence k i3 , k 23 , etc., will all be positive; K 3 , 
which equals k i3 -f k 23 + k 33 + . . . + k iV3 , will be positive; 
A will be negative; and the distribution of X will be positively 
skewed. That is, if the distributions of the individual con- 
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tributions, e, are all positively skewed, then X itself will be 
positively skewed. Just the opposite is true if the individual 
contributions are all negatively skewed. If some of the indi¬ 
vidual contributions are positively skewed and others are 



Fig. 20.—Combination of eni'ves I, II, and III shown in Fig. 17. 



negatively skewed, then the value of K 3 , and hence the skewness 
of X, will depend on whether the positive or negative influences 
are predominant. Since the contributions are presumed to be 
of approximately the same order of magnitude, the result will 
depend primarily on the relative number of positively skewed 
and negatively skewed contributions. 
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The effect of the B term on the distribution of X is shown 
graphically in Figs. 17, 19, 21, and 23. If B is positive, the effect 
of the B term is to add a maximum percentage to the ordinate 

Y _ 

of the normal curve at —-— = 0. This percentage decreases 
X — X 

in size until —-— equals —.74 and +.74, respectively, from 
which points an increasing percentage is subtracted from the 



Fig. 22. Combination of curves I and II' shown in Fig. 21. 


normal ordinates^ The maximum percentage subtraction is 
attained at —-— — — \/Z and + \/S, after which a decreas- 

• X _X 

mg percentage is subtracted until —-— = —2.33 and +2.33. 

From those points on, a small percentage is added to the normal 
ordinates. 

These fluctuations in the percentages added and subtracted 
are illustrated by curve III of Fig. 17, for which £/d 4 is taken 
equal to +.10. When B is positive, the effect of the B term is 
to add to the normal curve in the immediate vicinity of the mean, 
to subtract from it at intermediate distances from the mean, and 
then to add to it again on the tails of the distribution; the result 
is to give the final curve a greater than normal peakedness. 
That is, if B is positive, the distribution of X will have a coeffi¬ 
cient of kurtosis greater than 3 (cf. Fig. 19). Just the opposite 
is true if B is negative. In this case subtractions are made 
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from the normal curve within the immediate vicinity of the mean, 
additions are made at intermediate distances, and subtractions 
are again made out on the tails (cf. Fig. 21). The result is to 
produce less than normal peakedness (cf. Fig. 23). If B is 
negative, therefore, the distribution of X will have a coefficient 
of kurtosis less than 3. 



Fig. 23.—Combination of curves I and III' shown in Fig. 21. 


Again this result was to be expected. For the value of B 
depends on the value of K 4 , and this in turn depends on the 
values of kn, k 24 , etc., which represent the excess above 3 or the 
deficit below 3 of the coefficients of kurtosis of the distributions of 
the individual contributions. Thus if the distributions of the 



Fig. 24. —Combination of curves I, IF, and III' shown in Fig. 21. 

individual contributions, e, are all more peaked than normal, 
i.e., if the k 4 *s are all positive, then K 4 , which equals 

ki 4 + k 4 + k 34 + • * • + kv4> 

will also be positive, B will be positive, and the distribution of X 
will be more peaked than normal. The opposite is true if the 
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distributions of the individual contributions are all less peaked 
than normal. If some of them are more peaked than normal 
and others are less peaked, then the peakedness of the distribu¬ 
tion of X will depend upon the predominance of excess or deficit 
influences; and since the individual contributions are all assumed 
to be of approximately the same order of magnitude, this predom¬ 
inance will be primarily determined by the relative number of 
excess and deficit influences. 

Usually the A and B terms are the only additional terms 
deemed worthy of consideration. For unless the number of 
causal factors, N, is relatively small, higher-order terms will be 
of negligible importance in determining the distribution of X, 
continuing to assume, of course, that the contributions of the 
various causal factors are approximately of the same order of 
magnitude. 1 In the practical work of fitting a Gram-Charlier 
curve, it also becomes difficult to determine with any degree of 
accuracy the values of these higher-order terms. 2 For these 
reasons, Eq. (1) is usually considered a sufficiently general equa¬ 
tion. In special cases of very skewed distributions, somewhat 
better results may be obtained by a type of mathematical approx¬ 
imation that gives rise to a different equation 3 from Eq. (1). 
Gram-Charlier distributions of this latter kind are called “ type 
B distributions” to distinguish them from those given by Eq. (1), 
which are called ‘ type A distributions.” 

GENERAL SIGNIFICANCE OF THE GRAM-CHARLIER ANALYSIS 

The principal significance of the Gram-Charlier analysis for 
the theory of frequency curves is the explanation it affords of 
nonnormal frequency curves when the number of contributory 
causes is limited. The Pearsonian analysis, it will be recalled, 4 

1 Failure to include higher-order terms, however, may sometimes produce 
negative frequencies in certain parts of a Gram-Charlier curve, which is of 
course a practical impossibility. Such, for instance, is seen to be the case in 
Figs. 18, 22, and 24. 

2 The difficulty is that the sampling fluctuations in these higher-order 
terms are especially great. 

3 For a discussion of this special variation in the analysis, see C. Y. L. 
Charlier, “fiber die Darstellung willkiirlicher Functionen,” Arkiv for 
Matematik, Astronomi och Fysik, Yol. 2, No. 20 (1905), pp. 1-35. An ele¬ 
mentary discussion in English is to be found in H. L. Rietz, Mathematical 
Statistics , Chap. VII. 

4 See p. 63. 
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was strictly valid for nonnormal distributions of discrete vari¬ 
ables, but its explanation of continuous nonnormalfrequency 
curves depended upon a tardily introduced assumption of 
“ lumpiness J5 in the contributions of the causal factors—a pro¬ 
cedure that was not entirely satisfactory since the analysis 
was originally based upon the assumption of definitely fixed 
contributions. 

The Gram-Charlier analysis assumes at the very beginning 
that the contributions of a causal factor themselves vary con¬ 
tinuously, the probabilities of various possible values being given 
by a distribution of a definite but unknown form. It then goes 
on to show that if a given variable X is an algebraic sum of the 
contributions € of a number of such causal factors, if the con¬ 
tributions are approximately of the same order of magnitude 
(i.e., if the standard deviations of the contributions are about 
equal), and if they are independent of each other, then the form 
of the distribution of X will depend partly on the forms of the 
distributions of the individual contributions and partly on the 
number of such contributory causes. Tims, if the number of 
causal factors is very large, the distribution of X will be approxi¬ 
mately normal, whatever the forms of the distributions of the 
individual contributions. If the number of causal factors is 
only moderately large, however, but their contributions are still 
of the same order of magnitude, the skewness and kurtosis of the 
distributions of the individual contributions will tend to produce 
a skewness and kurtosis in the distribution of X. In special 
cases the skewness of the individual contributions may be com¬ 
pensatory. Thus it is possible for the distribution of X to be 
symmetrical, not only when the distributions of the individual 
contributions are themselves all symmetrical, but also when the 
skewness of one or more contributions is offset by a contrary 
skewness of one or more other contributions. Similar remarks 
may be made with respect to the kurtosis of X. 

These conclusions are much the same as those drawn from the 
Pearsonian analysis pertaining to the asymmetrical binomial 
distribution. In fact, this part of the Pearsonian analysis may 
be considered a very special case of the Gram-Charlier analysis. 
Suppose, for example, that, in the latter, distributions of the 
individual contributions are all assumed to be of a very special 
sort. Suppose that in each case only one specified positive 
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contribution and one specified negative contribution are possible 
and that these have the same absolute value. Let the probability 
with vhich the contribution takes on its positive value differ from 
that with which it takes on its negative value. Furthermore, let 
these distributions of the individual contributions be all alike, 
both as to the amounts of the positive and negative values and 
as to the probabilities of each. Then, under these very special 
assumptions, the Gram-Charlier analysis will yield the asym¬ 
metrical binomial distribution. The latter is thus a special 
case of a Gram-Charlier distribution. 

Finally, it is to be noted that the Gram-Charlier analysis does 
not, as does the Pearsonian analysis, relax the assumption of 
independence of the contributory causes. The latter, it will be 
recalled, laid aside this assumption and permitted the contribu¬ 
tion of a causal factor to be dependent on the contributions of 
other factors. The skewness and kurtosis yielded by the general 
Pearsonian formula were in part to be attributed to this assump¬ 
tion of dependence. The Gram-Charlier analysis assumes 
throughout that the contribution of any causal factor is inde¬ 
pendent of the contributions of other causal factors. It is 
restricted in this respect in the explanation that it affords of 
actual frequency distributions. 

APPENDIX 

DERIVATION OF THE GRAM-CHARLIER FORMULA FOR A TYPE A 
FREQUENCY DISTRIBUTION 

The following is a more detailed discussion of the Gram- 
Charlier system of frequency curves than was given in the main 
body of the text. Its principal purpose is to outline the argu¬ 
ment by which the formula for a type A frequency distribution 
[Eq. (1)] is derived. For the sake of those not well acquainted 
with higher mathematics, the essential steps are presented in a 
more or less nonmathematical manner, while the mathematical 
basis for each is sketched in a series of footnotes. For a fuller 
mathematical analysis, the reader is referred to the original works 
of Gram and Charlier (see footnote to page 82) or to the English 
accounts of them given by Arne Fisher’s Mathematical Theory 
of Probabilities or H. L. Rietz’s Mathematical Statistics . 1 The 
1 Also see Thiele, T. N., Theory of Observations, first published in 1903, * 
(C. and E. Layton, London) and recently reprinted in the Annals of Mathe¬ 
matical Statistics, Vol. 3 (1933), pp. 165-308. 
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present discussion follows closely that outlined in F. 1. "W hittaker 
and G. Robinson, The Calculus of Observations, 1 pages 168 to 174. 

The cornerstone of the Gram-Charlier analysis is the Fourier 
integral theorem. This, so far as the argument in question is 
concerned, states that if a certain function of a given variable, 
called its “moment generating function,” is known, then the 
frequency distribution of the variable itself may be determined 
and conversely if the distribution is known its moment generating 
function may be found. 2 This moment generating function is 

of the form G = % e id *f(x) dx, where x = X - X, f(x) dx is the 

distribution of x, e is the constant 2.718 + , = V — 1, and B is an 

arbitrary variable. 3 G is called the moment generating function 
because, if certain operations are performed on it with reference 
to id and then 6 is given the value of 0,, the results are the various 
moments of the distribution of x.* 

The significance of the moment generating function for the 
present analysis is that if a variable X is the sum of a number of 
other variables, the values of any one of which are independent 
of the values assumed by the others, then the moment generating 
function of the composite variable X is simply the product of 
the moment generating functions of the independent variables. 4 
Thus, with reference to the given problem, if variations in X 
are the sum of the contributions of certain causal factors, all 
acting independently of each other, the moment generating 
function of the frequency distribution of X, the form of which 

1 Blackie & Son, Ltd., Glasgow, 1925, 2d ed. 

2 More exactly, if the moment-generating function is defined as 

G(0) == f_ w f(%)e ie * dx, 

where i = and f(x) dx represents the frequency distribution of £, then 

F( x ) = J- / 00 G(d)e~ i9x dd, and vice versa. 

J 2k J — oo , n ■ 

s When x is distributed in the form of a smooth continuous curve, G- is 

defined more exactly by the formula of the previous footnote. 

* Thus, if G is differentiated with respect to id and d is set equal to 0, 
the result is the first moment of f(x). If G is differentiated twice and then d 
is set equal to 0, the result is the second moment of f(x), etc. -This is fre¬ 
quently the simplest method of obtaining equations for the moments of a 
known distribution. 

4 The reason is that = c ah . Cf . Whitaker and Robinson, op. cit ., 

p. 170. 
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is being sought, is the product of the moment generating func¬ 
tions. of the frequency (i.e., probability) distributions of the 
individual contributory factors. Symbolically, 

G x — G1G2 ’ ' * G n 

This relationship between the generating function of the sum 
and that of the individual contributions is of fundamental impor¬ 
tance, but it does not of itself advance the argument. For there 
is nothing in the given assumptions that provides any knowledge 
about the generating functions of the individual contributory 
factors. Hence the relationship of the latter to the generating 
function of X can of itself give no information about the nature 
of this resultant generating function. 

The next step in the argument, however, yields more tangible 
results. .This is to note that the logarithm of a moment generat¬ 
ing function can be represented by an infinite series, the terms of 
which consist of rising powers of id and certain quantities that 
depend on the distribution to which the generating function 
relates. Thus if /(«i) de 1 represents the probability distribution of 

co 

cause 1 and <?i = £/(« de i is its moment generating function 
then the logarithm of Gi can be put in the form 
log G x = iflk n — ^ <- ' W 3 -- . 0 4 . 


k i3 -f — ki4 ~ 


where ku, k x2 , etc., are quantities that are called “semivariants” 
or “cumulants” and, if ei is measured from its mean, are related 
to the moments of f(e 1 ) de 1 as follows: 1 

CO 

1 The moment-generating function <?i = £ de, is a function of 

ie and may be written G,(ie). Let log G,(ie) be called K,(is); then K x (ie) 
may be expanded in a power series in is (Taylor’s expansion), in the neigh-, 
borhood of id = 0, as follows: 

KAiO) = K x { 0 ) + K[{0)ie + K"( 0 ) + • . . 

where K u K[, etc., represent derivatives with respect to id. The problem 
is to find JSTi(O), ^(0), K"{ 0), • • • P 

Since = 1 when 0=0 [note that the sum from - «> to + «> of the 

the probabilities of /(ei) de x = 1], it follows that log G x ( 0) = 0, i e K x (0) = 0 
Again ’ 

d log Gi(id) __ 1 dGi(id) 

d(id) Gi(id) d(id) 
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kn = yi = 0 , ki 2 = ^ 2 ) ^13 = y3> ki4 == IN “ 3y!» e ^ 0. 

The same can be done for the logarithms of the generating func¬ 
tions of the other contributions, and since the logarithm of a 
product is the sum of the logarithms of the various factors, the 
logarithm of the generating function of X can be represented as 
the sum of the N infinite series representing t he logarithms of the 

K'm = 


Gi(ie ) 


or 


GSeWm = G[{ie) 

Furthermore, by expanding in a power series, it follows that 
«0 «0 

G\(id) — ^ /(ei) dei + id ^ ei/(ei) de 1 


+ 


(id) 

2 ! 


£ «’/(«) dn + (J lf X «!/(«) de, + 


= 1 + 


, (*0* , 
+ "of V2 + 


(iff) 1 

3! 


+ 


: «0 w 

because by definition un = ^ ei? = S e ^ c “ 

. _ CO — 00 

Thus 

(id) 2 

G[(id) — yi + iffy 2 H— 2\~ V 3 "b * * * 
and likewise from Taylor’s expansion for Ki(f0), it follows that 
K[{ie) = JCl(O) + if"(0)*9 + Ki"(0) ^ + • • • 


Hence the above identity may be written 

r 1 + *9*. + ^ *»+•••] [ K{(0) + Kl'(0)»9 + Kl"(0) + • • •] 

L ' ' , . (»9)* , 

» »,i + + -^j" V* T * ‘ * 


If the left side of this identity is multiplied out and if the coefficients of 
various powers of id on the left are equated to the coefficients of the same 
powers of id on the right, it follows that when ei is measured from its mean 



K[( 0) 

= Vi = o 


k7(0) 

= *12 


kT(0) 

= W2 

and 




2!' R ' i(0)+ 3! 3! 

or 




K?(0) ff 

= V4 - 

If kn 

is set equal to i^(0), kia to K x 

(0),ki 3 to K\ (0),k i4 to KY (0), etc., the 


equation for log G\(id) — K\(id) is seen to be equation (1) of the text. 
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generating functions of the N individual contributions. Thus, 
if the first subscript of a cumulant k represents the contribution 
to which it pertains and the second the order of the cumulant, 
it follows that: 1 

log Gx = — (ki 3 + k 22 + k32 + • • • + k^ 2 ) 

+ (ki3 + k 23 + k 33 + • • • + k A r 3 ) 

+ (ki4 + k 2 4 + k 3 4 + • • • + k jV4 ) jj + • • • (2) 

Comparison of the coefficients of Eq. (2) with those of (1) indi¬ 

cates that the cumulants of the generating function of the distri¬ 
bution of X are the sums of the corresponding cumulants of the 
generating functions of the individual contributions. That is, 
if the cumulants of X are indicated by K 2 , K 3 , etc., it follows that 

K 2 = ki 2 + k22 + * * * + k A r 2 

K 3 — ki3 + k2 3 + • • • + k 

K 4 = ki 4 -f* k24 k,v 4 , etc. 

It is this last expression that permits certain inferences regard¬ 
ing the moment generating function of X and hence the distribu¬ 
tion of X. Thus suppose that two additional assumptions are 
made regarding the distributions of the contributions of the 
individual causal factors, viz.: (1) that the number of these 
causal factors, N, is very large and (2) that the variances d 2 = y 2 
of the individual contributions are all approximately of the same 
order of magnitude. The ki 2 ’s, it will be recalled, are equal to 
the variances of the individual contributions, and expression (3) 
shows that their sum gives the variance K 2 of X. The variances 
of physical, biological, and economic variables are generally of 
finite size so that for practical purposes it may be assumed that 
the variance of X is finite. Since it is equal to the sum of N 
variances, all of about the same order of magnitude, it can be 
concluded that each of the contributory variances is of the order 
of 1/JV and that each contribution is of order 1 /s/N. 

From this it follows that the higher cumulants and mqments 
are of order greater than 1 /N, for example, that k i3 , k 23 , etc., 

1 For, it is to be remembered, kn = 0 on the assumption that ex is meas¬ 
ured from its mean, and the same is true for k 2 i, k 3 i, etc., on the assumption 
that all the e’s are measured from their means. ' h . h . ^. 
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are of order l/\/N d and that ki 4 > k 2 4» etc., are of order 1/iV 2 . 
Since K 3 is equal to the sum of the N eumulants ki 3 , k 23 , etc., and 
K 4 is equal to the sum of the N eumulants k u , k 34 , etc., the magni¬ 
tude of these higher eumulants of X is therefore of the order of 
1 /VF and l/N, respectively. Consequently, if the variance of 
X is finite and if N is very large, as is assumed, the size of K 3 , K 4 
and that of the higher eumulants of X will be very small. A first 
approximation to the generating function of the distribution of 

~ B* ry f -K 2 0 2 1 

X is thus given by log G x — — K 2 ^ or G = exp I 2 j.J j 


and 


from this, by means of the Fourier integral theorem mentioned 
above, it may be shown 1 that a first approximation to the distri- 

bution of X itself is fix) = exp ^ which is the 

formula for the normal curve. The Gram-Charlier analysis 
thus shows that if the number of the contributory causes is very 
large and if their contributions are of about the same order of 
magnitude, i.e., if the variances of the contributions of the indi¬ 
vidual causal factors are roughly the same, the distribution of the 
resultant variable X will approximate the normal form. 

1 Since, according to the Fourier integral theorem, the frequency distribu- 
1 j °° G(d)e~ i9x dd when its moment generating func¬ 


tion of X is f(x) = 


V2i 


tion is G(0) 
then 


J 05 f{x)e i6x dx , it follows that, when G{9) = exp 

f(x) “ h I A exp ~ de 


But if the square of the exponential is completed, it makes 


/(») 


L (• 

hr J- 


2ir 

2 7T 


exp 


exp 

— x 2 
2K 2 


[-( 

]/ 


Kjf 2 

2 


4- idx + 


(ix)' 


exp 


IP j]_' 

[-(VI 


P [M! 

p L2K2 


dd 


V 7 2K 

<vi 


n 


9 + 


V 7 2Kj 


But the integral is of the form 
Hence, 

m 


e ~ z2 dz , which equals Vlf 


V27rK ; 


exp 


[-&]■ 
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If the variances of the individual contributions are about the 
same (that is, if k i2 , k 2 2 , etc., are approximately equal) but 
the number of causal factors, N, is not large enough to warrant the 
dropping of terms of the order of 1 /y/N and 1/N, although 
sufficiently large to warrant the dropping of terms of higher order 
(f.e., terms of order 1/ \/N z , 1/N 2 , etc.), then the moment generat¬ 
ing function of the distribution of X will be given approximately 1 

by log G x = — K 2 2 j + K 3 -yp + K 4 yy From this generating 

function, the distribution of X is found to be 2 

1 For K 5 , Kg, and the higher cumulants of the generating function will 
all be of higher order than 1/N] and, according to the assumption, terms of 
that order may be considered as negligible. 


2 If log Ox = —K 2 |^ + K 3 M- 3 


+ Im — and 


f -K 2 02 K s (^)3 , K 4 en 

L 2! + 3! + 4! } 


then the distribution of x is given by 


f(x> = i j/,. fxp [ 


-K 2 02 

2! 


K.(iff)* 

3! + 


This may also be written, to within terms of order 1/N, 

_ 1 1 , K 3 W 3 . K^l f-K 2 02 1 

m L + + TT J ex P |_ 2 ]- iex j 

But as noted in the footnote to page 97 

and successive differentiation with respect to x gives 

b L . ex p [t - "*] de = £ v§=g ex p [ - 2 


1 , 00 a . r -k 2 e 2 
2 «)-J BXP [~-2T 


3 df, = ^(vis: exp [-&]) 


Hence fix) may be written in operational form, 


f(x) ~ t 1 “ if + fr (as)*] jim “p [ 2 £] 


or, actually working out the differentiation, 
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1 


/(*)- 


d \/2tt 


1+ 4 


/3» _ xfv 

V* d*j 

+. 5 ('- 


6 x 2 £ 4 ' 


i)] exp [~S] 


( 4 ) 


where x ~ X — X, d is the standard deviation of, x — 

A = —K 3 /3!, and B = K 4 /4! _ 


,, , 1 Pi i A ( 3x _ *f\ 

f(x) ~ e V^L + d 3 Vo d») + d 4 V 


to* | £ 

"d 2 * + d 4 


)]exp[-|V| 


where d 2 = K 2 , A = —Kg/ 3 !, and B — K. 4 / 4 !. 




CHAPTER VI 

SUMMARY OF THE THEORY OF FREQUENCY CURVES, 
AND SOME EXAMPLES 

At the end of Chap. Ill a summary of the conditions leading to 
a normal curve was given. These will now be reviewed, and the 
conditions leading to nonnormality will be summarized. 

Summary of Conditions Leading to Normality. The Pear- 
sonian and Gram-Charlier analyses suggest that a variable will 
tend to be normally distributed if (1) its fluctuations are the sum 
of the fluctuations in a number of contributory causes; (2) the 
fluctuations in the contributory causes are all of about the same 
order of magnitude; (3) the contributory causes act independently 
of each other; and (4) the number of contributory causes is large, 
while the contribution of each is relatively small. These condi¬ 
tions, it will be presumed, are sufficient to produce a normally 
distributed variable. It is not to be presumed, however, that 
they are all necessary. It is possible, for example, that the 
presence of (4) may under some conditions make (3) unnecessary. 

Summary of Conditions Giving Rise to Nonnormality. The 
conditions that are likely to give rise to nonnormal frequency 
curves are in general the negation of the conditions that give 
rise to normal frequency curves. Thus, distribution of a variable 
X may fail to be normal for any one of the following reasons: 

1. If the deviations of X from some central value are simply 
an algebraic sum of the contributions of a number of causal 
factors, if these contributions are of about the same order of 
magnitude, and if they are independent of each other, the dis¬ 
tribution of X may not be normal (even approximately) if the 
distributions of the individual contributions are themselves 
nonnormal and if the number of causal factors is not exceptionallv 
large. 

2. If. the deviations of X from some central value are an 
algebraic sum of the contributions of a number of causal factors 
and if these act independently of each other, the distribution 
of X will not be normal if the contributions of a few of the causal 

100 
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factors are of outstanding importance as compared with the 
others and if the distributions of these contributions are them¬ 
selves not normal. If the magnitude of one contribution, for 
example, overshadowed that of all the others, then the distribu¬ 
tion of X would tend to conform to the distribution of that pai tic- 
ular contribution. 

In this connection it is to be noted that, if one or more con¬ 
tributing factors are of outstanding importance compared with 
other causes of variation, the conditions producing variation 
in X might possibly be considered as nonhomogeneous. This is 



especially likely to be true if the factors in question are qualitative 
in nature. 

Consider, for example, the heights of adult human beings. Of 
all the single factors affecting mature height, sex and race are 
among the most important. Their effects are so outstanding that 
unless they are removed, by statistical classification, they over¬ 
shadow the effects of most other causes of variation among 
mature persons. Furthermore, both sex and race aic qualitative 
factors that make a contribution to height of one amount when 
one sex or race is present and a contribution of a distinctly diffei - 
ent amount when another sex or race is present. The net effect 
is to produce a variable, viz., adult height, that is not normally 
distributed. 



102 GENERAL THEORY OF FREQUENCY CURVES 

The effects of sex and race being of this sort, it is the usual 
practice to view height in general as a heterogeneous variable 
whose frequency distribution has little significance. For when 
adult human beings are segregated according to sex and race 
their heights form a distinctly normal distribution. This serves 
to illustrate how a distribution may be nonnormal because the 
variable m question is nonhomogeneous. 



igure 25 and Table 13 offer an example of the production 
ot a nonnormal distribution through the combination of two 
normal distributions with different mean values. This is what 
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the distribution of the heights of white adults would be like. . It 
is of little significance because the important effect of the sex 
factor has not been separated from the minor effects of the many 
other factors influencing height. 

3. If the deviations of X from some central value are an 
algebraic sum of the contributions of a number of causal factors 
and if these are all of about the same order of magnitude, the 
distribution of X will not be normal if the amount of the con¬ 
tribution of a causal factor is dependent on the contributions of 
other causal factors. It is possible, however, that even under 
these conditions the distribution of X will be approximately 
normal if the number of contributory causes is very large. 1 

4. If the deviations in X from some normal value are not a 
simple algebraic sum of the contributions of a number of causal 
factors but are related to them in a more complex manner, it is 
likely that the distribution of X will not be normal. Thus 
suppose that the deviations in X are equal to the cube of the sum 
of the individual contributions instead of the sum itself. Then, 
although the sum may be normally distributed, X will be dis¬ 
tributed in a skewed manner. 2 

This suggests that where certain “linear” measurements of a 
set of “individuals” are normally distributed, other “nonlinear” 
measurements of the same individuals will not be normally dis¬ 
tributed. For example, the heights of adult males of the white 
race make up a set of homogeneous linear measurements, while 
the weights of these individuals compose a set of nonlinear 
measurements in the sense that the weight of an individual is 
highly correlated with his “volume,” a quantity that is likely 
to vary with the cube of height rather than the height itself. 
In actuality, the heights of adult white males are found to be 
normally distributed, while their weights are definitely skewed, 
thus giving an empirical illustration of the relationship between 
the distribution of a sum and the distribution of its cube. 

5. In some cases data that are selected from a larger normal 
group of data are themselves nonnormally distributed because 

1 Cf . p. 63. Also see Markov, A., Wahrscheirilichkeitsrechnung (B. G. 
Teubner, Leipzig, 1912), Appendices II and III. 

2 Cf. Rietz, H. L., Mathematical Statistics, pp. 72-74, and “Frequency 
Distributions Obtained by Certain Transformations of Normally Distrib¬ 
uted Variates,” Annals of Mathematics, Vol. 23 (1922), pp. 292-300. 
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of the special way in which they are selected. Suppose, for 
example, that the United States Army refuses to take adult 
males whose heights are less than 64 or greater than 74 inches. 
The distribution of the heights of Army men would then be a 
“truncated” normal distribution, such as that pictured in Fig. 
26. The extension of the curve by a dotted line recognizes the 
possibility that a few exceptionally qualified men would be 
accepted in spite of the rule. Again, suppose that the students 
who select courses in higher mathematics are only the better 



Fig. 26 . —Nonnormal distribution of heights resulting from special selection. 

students, say the upper two thirds. Although the intelligence 
quotients of all students would be normally distributed, those of 
mathematics students would be more or less like the upper part 
of a normal curve. This is illustrated in Fig. 27, the rounded 
tail below 80 indicating that a few below that score would be 
allowed to enter the mathematics course despite the general rule. 
On the other hand, the distribution of I.Q.’s of students taking 
courses in a notoriously easy field of study would be more or less 
like the lower part of a normal curve, such as that pictured in 
Fig. 28. In all these cases no radical departure from the analysis 
of the previous chapter is required to explain the nonnormal 
character of the data. Nonnormality of the subgroup results 
only from special selection from a larger normal group. 1 

1 Note that the nonnormality of the subgroup arises from its selection, 
not at random, but with definite reference to the attribute of its members. 
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These are the principal reasons, it would appear, for the occur¬ 
rence of nonnormal frequency distributions. 



Jig. 27. Nonnormal distribution of grades resulting from special selection— 


positively skewed. 



Fig. 28 . Ndnnormal distribution of grades resulting from special selection--- 

negatively skewed. 

EXAMPLES OF NONNORMAL FREQUENCY DISTRIBUTIONS 

. Examples from Everyday Life. Figure 29 shows the distribu¬ 
tion of weights of 300 Princeton freshmen of the class of 1943. 
Its gen era l shap e and the va lue of fa = .36795 and fa = 4.6057 

A nonnortnul subgroup may be obtained from a larger normal group in this 
way even though the latter is homogeneous. 
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(corrected for grouping) indicate a definite departure from nor¬ 
mality. Since height is a linear measurement and since weight 
is related to volume, which is a cubical measurement, it is possi¬ 
ble, as suggested in the previous section, that the departure of 
weights from normality is due to the variation in weight being 
equal to the cube of a sum of elementary variations rather than 
to the sum itself. 



Fig. 29.—Histogram and fitted Gram-Charlier curve, distribution of weights of 
300 Princeton freshmen. 

The distribution of family incomes is a distinctly nonnormal 
distribution. This is illustrated by the distribution of family 
incomes in the United States in 1935-1936, as shown in Fig. 30. 
There are various causes for this departure from normality. 
Possibly one important cause is that, the more money a family 
has, the easier it is to get still more money. That is, it is likely 
that a principal cause of the nonnormality of the distribution of 
incomes is the lack of independence of the factors contributing to 
variation. 
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Examples from Sampling Analysis. Some of the better known 
forms of nonnormal frequency distributions are produced by the 
process of random sampling. The theory of sampling that 
provides the mathematical models for various types of sampling 
problems is discussed at some length in Parts II and III of this 
volume. It will be sufficient here to call attention to certain 
phases of sampling theory that are closely related to the discus- 



Fig. 30.—Distribution of family incomes in the United States, 1935-1936. 

{National Resources Committee , Consumer Income in the United States , p. 3.) 

sion of the previous chapters and serve to illustrate the generation 
of nonnormal frequency curves. It is to be noted that some of the 
sampling distributions here discussed are nonnormal only when 
the samples are relatively small and tend toward normality 
as the size of the sample is increased. This tendency will be 
noted in the discussion so that the conditions producing nor¬ 
mality and nonnormality will be clearly contrasted. 

Sampling Distribution of the Mean for Any Population. The 
theoretical discussion of the previous sections offers considerable 
information about the distribution of the means of samples 
drawn at random from a large (theoretically infinite) population. 
For it will be noted that the mean is merely l/N times the sum 
of the individual cases. The variations in the individual cases 
from sample to sample are thus the “contributory causes” (the 
Ps of the Gram-Charlier analysis) of the sampling variation in 
the mean. Since the individual cases all come from the same 
population, 1 their individual distributions are all alike [that is, 

1 The population is assumed to be so large that the successive withdrawals 
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/(e i) = /(e 2 ) = /(e 8 ), etc.]. Hence the cnmnlants of the distribu¬ 
tion of the mean will be 1 /N k ~ l times the cumulants of the 
population from which the samples are taken . 1 

Since the second cumulant equals the variance, it follows 
that the variance of the distribution of the mean will equal 1/N 
times the variance of the population. Likewise, the third cumu¬ 
lant of the distribution of the mean (which equals the third 
moment of the mean) equals 1/N 2 times the third cumulant, 
(i.e., moment) of the population, and the fourth cumulant of the 
distribution of the mean equals 1/N 3 times the fourth cumulant 
of the population. Therefore, the distribution of sample means 
will have the same general form as the population from which 
the samples were drawn. Its variance, however, will be less, its 
skewness much less, and its kurtosis very much less than that 
of the population. In fact, if the sample is large, the skewness 
and kurtosis will be practically nonexistent and the distribution 
of sample means will become practically normal. 

Sampling Distribution of Any Linear Function. What is true 
of the mean is also true of any statistic that is a linear function of 
the individual variables. For example, the regression coefficient 
bu = Hx\X 2 /Hx\. If samples are drawn from a bivariate popu¬ 
lation in such a manner that the x 2 values are always the same 
and only the Xi values change, then bu becomes merely a linear 
function of the sample aTs, viz., 

bi 2 = Ai4 1] + A 2 x [ / 1 + • • • + A n x [ f ] 

where, the A’ s are dependent on the given values of the x 2 f s and 
the numbers in the brackets differentiate the various sample 
values of x\. The cumulants of the sampling distribution of 612 
are accordingly related to the cumulants of the individual refs 
as follows: 

K 2 — A\k 12 + A 2 k 2 2 + * • • T A 2 k n 2 

if 3 = A\kis + Aik 2 3 + • • * + A*k n z 

K 4 = A\ki4 -f- A%k 2 4 + * * • T Afk n 4 

of the members of a sample do not materially affect the distribution of 

probabilities in the population. 

1 The cumulants of the sum of the cases would be N times the cumulants 
of the population, and therefore the cumulants of the mean (which equals 
l/ATh of the sum) would be 1/N h ~ l times the cumulants of the population. 

Note that the kt h cumulant of f(X/N) is 1/N k times the kih cumulant of 

/(X). 
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in which kn means the second cumulant of the first case in the 
sample of xi% that is, z [ x 1] ; means the third cumulant of aV 1 ; 
k n i means the fourth cumulant of the nth case in the sample of 
au’s, that is, a: 1 " 1 ; etc. 

The variance of the sampling distribution of b i2 will conse¬ 
quently be directly related to the variance of the individual 
iTi , s, bein^ merely their weighted sum. Similarly, the skewness 
and kurtosis of the sampling distribution of bn will be a weighted 
average of the skewness and kurtosis of the distributions of the 
individual au’s. If the distributions of the aids for each x 2 are 
all alike in form (but not necessarily having the same mean, 
variance, etc.), the sampling distribution of bi 2 will have exactly 
the same sort (but not the same degree) of skewness and kurtosis 
as the individual #ds. 

It is to be noted, however, that the “weights” that enter into 
the relationships between the cumulants of 612 and those of the 
individual x/s (i.e., the Ai, A2, - . • , A n ) are equal to 




M 


Nat 


(for = Nal) 


So, if the variances of the individual .rds were identical or were 
about the same order of magnitude, the variance of bn would 
be l/N times the order of magnitude of the variances of the 
individual rrds and the third and fourth cumulants would be 
l/N 2 and 1/iV 3 times the order of magnitude of the variances of 
the individual x/s. Consequently, as in the case of the mean, 
the sampling distribution of the regression coefficient bn will 
have much less skewness and much less kurtosis than the original 
distributions of the individual .rds. In fact, if the size of the 
sample is large, the distribution of bn wall be practically normal. 
This will be true whether the distributions of the individual reds 
for various aAs are or are not normal or are or are not alike. 

The t Distribution. A number of sample statistics from normal 
populations have sampling distributions that are nonnormal 
because they are not linear functions of the cases, although the 
distributions all approach normality as the size of the sample is 
increased. Three important nonnormal sampling distributions 
are the so-called “t distribution,” the “x 2 distribution,” and, the 
li F distribution-” 1 The occasions on which these distributions 

1 The t distribution is also referred to as “ Student’s distribution,” after 
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arise will be discussed in subsequent chapters. It is the purpose 
here merely to describe these three important nonnormal sam¬ 
pling distributions. 



Fig. 31. The standard normal curve compared to the curve of Student’s 

distribution. 


A graph of the t distribution is shown in Fig. 31 1 . It will be 
noticed that the curve is symmetrical about the mean and looks 
very much like the normal curve. It has larger tails than the 


the pseudonym of the man (W. S. Gossett) who first called attention to it. 
Sometimes the variable 2 = §■ log e F is used instead of F, The distribution 
of ^ = h log e F is shown on page 114. 


^ It will be noted that the vertical scales of figures showing discrete prob¬ 
abilities are labeled “Probabilities,'■’ or “P(x),” (( P(H),” which mean 
“probability of an s” and “probability of an H” etc. Vertical scales of 
figures showing probability curves cannot properly be so labeled; for in the 
graph of a probability curve the vertical distance is merely an ordinate. 
The probability of a case falling in a given range is the area under the curve 
or that range. If the vertical scale of the graph of a probability curve is 
labeled f(x), then the probability, i.e., P(x), of a case falling in the infinites¬ 
imal range^ dx is given by P(x) =/(*) dx. For a finite range, - z 2 , 

P( ^ = Jx 1 ^ dx ' In the text ( see P a S e 116 ) Vb W, Vxh etc., are used 


as equivalents of /(f), f(F), /(x 2 ), etc. In this book, the vertical scales of 
probability curves are, generally speaking, not labeled. 
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normal curve, however, as can be seen from a comparison of the 
tw r o curves. 

The formula for the t distribution is 



where n is a constant that determines the shape of the curve, 
just as X and d do in the case of the normal curve. 

In general, the formula says that the infinitesimal portion 
of the total area cut off by the infinitesimal class interval t to 
t + dt is equal approximately to the area of the rectangle whose 


height is 






and whose base is dt The 


t curve has its mode at 0 and tapers off symmetrically in both 
directions as t goes to "f 00 and — 00 . Its mean is 0 and its 

variance is —Its skewness is zero, and its kurtosis is 
n — l 

: greater than 3 but approaches 3 as n increases. In general, the t 
curve approaches the normal curve as n increases; in fact, the 
standard normal curve is a good approximation to the t curve for 
values of n > 30. 1 

The x 2 Distribution. The x 2 distribution is a positively 
skewed distribution, a picture of which is given in Fig. 32. The 
mathematical formula for the curve is 



where n is a quantity that determines the shape and position 
of the curve. In general, the formula says that the infinitesimal 
portion of the total area cut off by the infinitesimal class interval 

1 A better approximation for values between 30 and 100, say, is given by 


taking < t \ 


instead of 1. 
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X 2 to x 2 + clx 2 is equal approximately to the area of a rectangle 

-x 3 TO-2 

2 


whose height is 


e 


(x 2 ) 


(2)2 


(w- 2 )' 


and whose base is d{x 2 ). 


The x 2 curve begins at 0, rises to a peak at x 2 = n - 2, and 
falls again to zero as x 2 goes to infinity. The mean of the curve 



is at n } and its standard deviation is \/2n. The skewness of 
the curve, as measured by Xy2 ~ M g* 2 , i s ^/2/n, There are thus 

d x 2 

different x 2 curves for different values of n. As n varies, the 
curve changes both its position and its shape. For larger values 
of n, the curve is located further along the x 2 axis and is more 
spread out and more symmetrical. Figure 32 pictures two 
different x 2 curves, one for n = 4, the other for n = 14. In 
general, as n gets larger, the x 2 curve approaches the normal curve. 
The normal curve is, in fact, a special case of the x 2 curve. 

The F Distribution. 1 The F distribution is positively skewed 
like the x 2 distribution. A picture of the F distribution is shown 

1 The reader is warned that the F of the F distribution and the F curve 
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in Fig. 33. The mathematical formula for the curve is as follows: 


dP 


/ s M o\ mm m-2 

/n t + n 2 ~ 2 )! ( Wl ) 2 M 2 (P) 2 

V_ 2 > _-3^ dF (3) 

(" , ^ 2 )! J -)(nf + n.)^ 


in which n t and n 2 are constants, such as n of the t curve and the 
X 2 curve, which determine the character of the curve. 



The formula for the F curve says that the infinitesimal portion 
of the total area cut off by the infinitesimal class interval F to 
F + dF is equal to the area of an infinitesimal rectangle whose 



/ i o\ ni n2 m — 2 

U 1 + )! (»0 2 (».) 2 (F) 2 

( ! V^) ! ( ” lF + 


and whose base 


is dF. 

. i , ^2(^1 — 2 ) 

The F curve starts at zero, rises to a peak at ana 

falls again to zero as F goes to infinity. Its mean is — anc ^ 


its standard deviation is 


n 2 


l 2(n 2 
\ n } 


+ ni - 2 ) 


The curve 


n 2 — 2 \ ni(n 2 — 4) 
thus varies with n 1 and n 2 . As n\ and n 2 get larger, the F curve 


formula has no relationship to the F symbol employed to represent, 
frequencies. 
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tends to become more symmetrical; and as rh and » 2 both 
approach infinity, the F curve approaches the normal curve. If 
one of the n's approaches infinity, while the other remains 
small, the F curve approaches the x 2 curve._ If n x = 1 and » 2 
approaches infinity, the distribution of y/F approaches the t 
distribution. Thus the normal curve, the t curve, and the x 2 
curve are all special cases of the F curve. 1 

1 As already pointed out on p. 109n., the variable * = * log e F is sometimes 
used instead of F. It is this 2 distribution that R. A. Fisher has called atten¬ 
tion to as the one general sampling distribution. Cf. Paul R. Rider An 
Introduction to Modern Statistical Methods (1939), who cites J. O. Irwin, 
Mathematical Theorems Involved in the Analysis of Variance/’ Journal of 
the Royal Statistical Society , Vol. 94 (1931), pp. 287#.; and R. A. Fisher, “On 
the Distribution Yielding the Error Functions of Several Well-known 
Statistics,” Proceedings of the International Mathematical Congress (Toronto, 
1924), pp. 805-813. The accompanying figure, numbered 34, is a picture of 
the 2 distribution for n x =4, » a = 3. It will be noted that the logarithmic 
transformation produces a much more symmetrical distribution. 




I 


CHAPTER VII 

NUMERICAL CALCULATIONS 
FOR FREQUENCY CURVES 

Up to this point the discussion has been primarily concerned 
with the how and the why of frequency curves, i.e., with theory 
per se. This chapter will show how frequency curves may be 
graphed, how frequencies and probabilities may be computed 
from frequency curves, how curves may be fitted to sample data, 
and how the goodness of fit may be tested. Attention will thus 
center in numerical rather than in abstract calculations. 

GRAPHING FREQUENCY CURVES 

It is sometimes desTed for instruction purposes or for illustra¬ 
tion to make graphs of some of the better known frequency 
curves. This section will indicate how such graphs may readily 
be made. The purpose is to show how a curve may be plotted 
after the constants of the curve have been determined. The 
problem of determining numerical values for the constants them¬ 
selves will be discussed in the third section on curve fitting. 

The Normal Curve. Probably the easiest curve to graph is the 
standard normal curve. Tables of the ordinates of this curve 
have been computed for various abscissa values. Such a table 
is Table VI in the Appendix. This gives values of 


V ^ 


<J \/ 2 tt 


2d 2 


for various values of x/6. To graph a normal curve it is neces¬ 
sary merely to plot these ordinates at selected values of x/6 and 
draw a curve through the tops of the ordinates. This has been 
done in Fig. 31. In making a quick freehand graph it may be 
noted that the curve is symmetrical, its mean and mode come at 
x/6 = 0, its points of inflection are at x/d = 1, and its tails 
stretch to plus and minus infinity. 

The standard normal curve is a normal curve whose mean 
is zero and whose standard deviation is 1 (since the abscissa 
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scale is in terms of x/d units). To graph a normal curve with 
a given mean it is necessary merely to place the zero point of the 
standard curve at the* mean and to adjust the abscissa scale for 
the difference in standard deviations. Further details and special 
adjustments are described in the section on testing goodness of 
fit of a frequency curve. 1 

The t, x 2 , and F Curves. For most other curves no tables of 
ordinates have been computed. Graphs of these curves must 
therefore be made directly from the equations for the curves. 
Most of the equations are such that it is easiest to find the 
logarithms of the ordinates first and then convert these to anti¬ 
logarithms. This method will now be illustrated for three impor¬ 
tant nonnormal sampling distributions, viz., the t curve, the x 2 
curve, and the F curve. 

The equations for these curves are 



If logarithms are taken to the base 10, these equations become 


log y t = log i log n - | log 7 T - log 

ft + 1 . / t 2 \ 

- ~2~ log (l + -) (V) 

log y x r = - Y log e + ~~ log X 2 - \ log 2 

-iog(-y^)i (20 


1 See pp. 137-352. 
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log y„ = log ! + Y log n, + ^ 

+ logF - log (^)! - log (^ 2 )l 

_ !Ll + J!! log ( ni F + n.) (3') 


In using these equations it is to be noted that logio 2 = .301030, 
l O g 10 e = .434295, and logio ir = .49715—; logarithms of fac¬ 
torials can be obtained from tables of the gamma function 1 given 
in Tracts for Computers (Cambridge University Press, London, 
1921) No. IV. For small values of n and n\ and the factorials 
may easily be computed directly. For example, if n = 8, 

then 2LZ-?! = lip? I = 31 = 3X2X1-6. If n = 7, then 
2 2 


~-?l = \ X I X \ V* ( see P a 8 e 255 )- 


For selected values 


of n or m and Eqs. (1'), (2'), and (3') will give values of log y 
for various values of t, x 2 , and F, and values of y can be com¬ 
puted by taking antilogarithms. Tables 14 to 16 illustrate this 
process for each of these three curves and Figs. 31 to 33 of Chap. 
VI show the final graphs. 2 


i By definition (n - 1)! - r (n) for both integral and fractional values of w. 

Se s Ordinates for the 2 distribution may be obtained from the F distribution 
as follows. For the values of F for which ordinates have been computed, 
the corresponding values of z may be obtained from the relationship 

z = i loge F = 1.1513 logio F 


The s ordinates for these values may be computed by multiplying the 
corresponding F ordinates by 2«*, or the logarithms of the z ordinates 
may be obtained by adding log,. 2 + log,. F to the logarithms of the /' 

ordinates. . 

If 2 is to be computed directly, the equation to be used is 


log , = log 2 + log + | log »1 + | log m + »,* log.. « 

_ log - log (^> - ^ ** <»'* + ” 2) 

Values of can be found from tables of e* given in most books of mathe¬ 
matical tables. This was how the figure in the footnote to p. 114 was 
derived. 



118 GENERAL THEORY OF FREQUENCY CURVES 

Other Curves. The graphing of other frequency curves 
presents about the same sort of problems as those discussed 
above. For example, when the numerical equation for a Gram- 
Charlier curve has been determined, 1 its graphing is readily 
accomplished by the use of the tables of ordinates of the standard 
normal curve and ordinates of its derivatives. These are to be 
found in the Appendix, Table VI. Their use is discussed below 
when the testing of the goodness of fit of a Gram-Charlier curve 
is described. 


Table 14. Calculation of Ordinates of the t Curve 
(n = 4) 


(1) 

t 

(2) 

. ; (3) 

1 +1 
n 

(4) 

log (3) 

(5) 

-^(4) 

(6) 

(5) in another 
form 

(7) 

antilog 

(6) 

(8)* 

Kt • (7) 

,0 

.00 

1.0000 

.0000 

.00000 

.00000 

1.0000 

.375 

.2 

.04 

1,0100 

.00432 

- .01080 

9.98920 - 10 

.9754 

.366 

.4 

.16 

1.0400 

.01703 

- .04258 

9.95742 - 10 

.9066 

.340 

.6 

,36 

1.0900 

.03743 

- .09358 

9.90642 - 10 

.8062 

.302 

.8 

.64 

1.1600 

.06446 

- .16115 

9.83885 - 10 

.6900 

.259 

1.0 

1.00 

1.2500 

.09691 

- .24228 

9.75772 - 10 

.5724 

.215 

1.2 

1.44 

1.3600 

.13354 

- .33385 

9.66615 - 10 

.4636 

.174 

1.4 

1.96 

1.4900 

.17319 

- .43297 

9.56703 - 10 

.3690 

. 138 

1.7 

2.89 

1.7225 

.23616 

- .59040 

9.40960 - 10 

.2568 

.096 

2.0 

4.00 

2.0000 

.30103 

- .75258 

9.24742 - 10 

.1768 

.066 

2.5 

6.25 

2.5625 

.40866 

-1.02168 

8.97832 - 10 

.0951 

.036 

3.0 

9.00 

3.2500 

.51188 

-1.27970 

8.72030 - loj 

. 0525 

.020 

4.0 

16.00 

5.0000 

.69897 

-1.74742 

8.25258 - 10 

.0179 

.007 

5.0 

25.00 

7.2500 

.86034 

-2.15085 

7.84915 - 10 

' 1 

.0071 

.003 


* K = V 2 )' 

/n - 2\ , _’ whlcll > for » = 4, equals .375. The logarithm of K t could be 


added to (5)^ before taking antilogarithms, but if a calculating machine is available it is 
probably easier to follow the outline of the table. 

When once the type of a Pearsonian curve has been determined 
and the numerical values of its constants have been computed, 2 
the curve can usually be graphed in much the same way as a ' t 
curve, x 2 curve, or F curve. The process usually consists of 
taking logarithms, setting up a table to compute the logarithms 

1 See pp. 133, and 142-144. 

2 See pp. 134, and 146-150. 
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of the ordinates, and then finding the antilogarithms. The 
process is illustrated below (pages 148-150). 


Table 15 . —Calculation of the Ordinates of the x 2 Curve 

{n — 14) 


( 1 ) 

X 2 

( 2 ) 

. 2172 x 2 

( 3 ) 

log x 2 

( 4 ) 

n — 2 i , 

2 loK X 

( 5 ) 

Kx 2 -( 2 ) + ( 4 )* 

( 6 ) 

( 5 ) in another 
form 

( 7 ) 

antilog 

( 6 ) 

4.0 

.8688 

.6021 

3.6126 

- 2.2205 

8.7795 - 10 

.060 

5.0 

1.0860 

.6990 

4.1940 

- 1.8563 

9.1437 - 10 

.139 

6.0 

1.3032 

,7782 

4 . 6.692 

- 1.5983 

9.4017 - 10 

.252 

7.0 

1.5204 

.8451 

5.0706 

- 1.4141 

9.5859 - 10 

.385 

8.0 

1.7376 

.9031 

5.4186 

- 1.2833 

9.7167 - 10 

.521 

9.0 

1.9548 

.9542 

5.7252 

- 1.1939 

9.8061 - 10 

.640 

10.0 

2.1720 

1.0000 

6.0000 

- 1.1363 

9.8637 - 10 

.731 

11.0 

2.3892 

1.0414 

6.2484 

- 1.1051 

9.8949 - 10 

.785 

12.0 

2.6064 

1.0792 

6.4752 

- 1.0955 

9.9045 - 10 

.803 

13.0 

2.8236 

1.1139 

6.6834 

- 1.1045 

9.8955 - 10 

.786 

14.0 

3.0408 

1.1461 

6.8766 

- 1.1285 

9.8715 - 10 

.744 

15.0 

3.2580 

1.1761 

7.0566 

- 1.1657 

9.8343 - 10 

. 683 

16.0 

3.4752 

1.2041 

7.2246 

- 1.2149 

9.7851 - 10 

.610 

17.0 

3.6924 

1.2304 

7.3824 

- 1.2743 

9.7257 - 10 

.532 

18.0 

3.9096 

1.2553 

7.5318 

- 1.3421 

9.6579 - 10 

.455 

19.0 

4.1268 

1.2788 

7.6728 

- 1.4183 

9.5817 - 10 

. .382 

20.0 

4.3440 

1.3010 

7.8060 

- 1.5023 

9.4977 - 10 

.315 

21.0 

4.5612 

1.3222 

7.9332 

- 1.5923 

9.4077 - 10 

.256 

22.0 

4.7784 

1.3424 

: 8.0544 

- 1.6883 

9.3117 - 10 

.205 

23.0 

4.9956 

1.3617 

8.1702 

- 1.7897 

9.2103 - 10 

> .162 

24.0 

5.2128 

1.3802 

I 8.2812 

- 1.8959 

9.1041 - 10 

> .127 

25.0 

5.4300 

1.3979 

i 8.3874 

- 2,0069 

8.9931 - 1.0 

> .098 

26.0 

5.6472 

1 . 415 C 

l 8.4900 

- 2.1215 

8.8785 - 10 

> .076 

27.0 

5.8644 

1.4314 

: 8.5884 

- 2.2403 

8.7597 - 1 C 

► .058 

- 


* = — .1505ft — log ^ !, which, for n = 14, equals --4.9643. 


COMPUTATIONS OF PROBABILITIES 

The computation of relative frequencies or probabilities for 
any frequency curve consists in finding the area under the curve 
for the stated range of values. This may be accomplished 
theoretically by a process of summation, or “integration.” For 
the more important curves employed in sampling analysis these 
integrations have been worked out by the mathematicians and 
the results published in a series of tables. 1 The average student 
1 The integrations are generally complicated and involve a process of 
approximation through the use of rapidly convergent infinite series. 





120 


GENERAL THEORY OF FREQUENCY CURVES 



The Normal Curve. Table VI in the Appendix gives the area 
under the normal curve between the mean point x/6 = 0 and 
selected values of x/ <J. Since the curve is symmetrical, the areas 
are the same for plus deviations as for minus deviations. 

To illustrate the use of the table consider a normal curve whose 
mean is 100 and whose standard deviation is 10. Since the mean 
of the standard normal curve is taken as its origin, 100 will in 
this case be the origin from which deviations will be measured. 
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Since 10 is the standard deviation, the deviations from the mean 
will be measured in multiples of 10. To find the probability 
of a case falling in the interval 100 to 120, for example, set 

^ = 2 and note from Table VI of the Appendix 

that the probability of a normally distributed variate falling 
between 0 and 2d from the mean is .477. Hence the prob¬ 
ability of a case falling between 100 and 120 is .477. To find 
the probability of a case falling between 90 and 100 set 


x _ 90 - 100 
d 10 


and note that the probability of a normally distributed variate 
falling between 0 and —Id (same as between 0 and +1) is .341. 
Hence the probability of a case falling between 90 and 100 is .341. 
To find the probability of a case falling between 90 and 120, it is 
necessary merely to add the probability of a case falling between 
90 and 100 to the probability of a case falling between 100 and 
120. Thus the probability of a case falling between 90 and 120 
is .477 + -341 = .818. To find the probability of a case lying 
between 110 and 120 it is necessary merely to subtract the 
probability of a case falling between 100 and 110 from the prob¬ 
ability of a case lying between 100 and 120. Since the prob¬ 
ability of a case falling between 100 and 110 is the same as the 
probability of a case falling between 90 and 100, it follows that 
the probability of a case falling between 110 and 120 is 


.477 - .341 = .136. 


These computations are pictured in Figs. 35a to 3 hd. 

The t Curve. Normal curves may be drawn with different 
means and different standard deviations, but when the variable 
has been measured in standard deviation units and is measured 
from the mean as an origin all normal curves become one and the 
same curve, i.e., the standard normal curve. It was therefore 
possible to construct a simple table which gave areas under the 
curve (i.e., probabilities) for various x/d deviations from the 
mean. 

The t curve is inevitably used in the standard form. Even 
in that form, however, its shape depends on the parameter n. 
Hence areas under the curve, i.e., the curve probabilities, will be 
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different for different values of n. At table is thus a threefold 
table, listing the areas or probabilities that correspond to different 
deviations from the mean value for various values of n. 

A typical t table will be found in the Appendix, Table VII. In 
contrast to the normal table, the deviations from the mean are 
given for selected probabilities, rather than the reverse. These 
selected probabilities are probabilities of an equal or greater 
absolute deviation, not probabilities of an equal or smaller devia¬ 
tion, as is true of the normal table. That is, the t table selects 



Fig. 36.—Equal probability areas of .025 in the tails of the t curve for n = 10. 

areas under the tail of the curve and gives for various values of 
n the absolute deviations from the mean that will yield these tail 
areas. Since the curve is symmetrical, the areas are equally 
divided between the two tails. 

An example will illustrate the use of the table. Suppose the 
parameter n has the value 10, and it is desired to find the absolute 
deviation from the mean that will yield a tail area of .05. To 
find this deviation, locate the row n = 10, and proceed to the 
right until the column marked .05 is reached. The figure so 
located is the deviation desired. It will be seen to have the 
value 2.228. This means that a deviation of +2.228 will mark 
off an area of .025 on the upper tail and a deviation of —2.228 
will mark off an area of .025 on the lower tail and the two devia¬ 
tions together will mark off an area of .025 on each tail, or a total 
area of .05. This is illustrated in Fig. 36. 



NUMERICAL CALCULATIONS FOR FREQUENCY CURVES 125 

The x 2 Curve. The x 2 table, Table VIII in the Appendix is 
very much like the l table. It is again a threefold table, giving 
areas (probabilities) and deviations for various values of n. Also, 
like the t table, it lists deviations for selected areas or probabili¬ 
ties, and these areas refer to the portion of the curve beyond the 
deviation. That is, the areas represent probabilities of an equal 
or greater deviation: Unlike the t table the deviations are 



0 4 ■ ' n .668 *2 

Pig. 37a. The 2 per cent area m the >U ppe^tail of a x 2 curve, n = 4 [.02 is the 

measured from the absolute origin of 0 and not from the mean 

of the curve. 

To illustrate the use of the x 2 table, let n = 4, and let the prob¬ 
lem be to find the deviation from 0 for which the probability of 
an equal or greater deviation is .02. To find this deviation 
locate the row n = 4, and proceed to the right until the column 
headed .02 is reached. The figure so located is the deviation 
desired. It will be seen to have the value 11.668 (see Fig. 37 a ). 

Consider another problem. Suppose it is desired to find the 
deviation from 0 for which the area under the curve is just .02. 
1 hat is, it is desired to find the deviation for which the prob¬ 
ability of an equal or smaller value is just .02. Let n be 4 as 
before. To find this deviation, locate the row n = 4, and proceed 
o the column headed .98; the figure so located will be the 
desired deviation. It will be seen to have the value .711 That 
is, the deviation for which 98 per cent of the curve lies to the 
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The F Curve. The F table (see Appendix, Table IX) is like 
the x 2 table except that it takes account of two parameters, 
Bl and n,, instead of one. It is thus a fourfold instead of a three¬ 
fold table. Because of this greater complexity, it gives values 1 or 
only two tail areas or probabilities, viz., the upper .05 tail and 
the upper .01 tail. 

The use of the F table may be illustrated by the following 
problem: Let m = 4 and = 3, and let it be desired to find 
the deviation for which the tail area (or probability of an equal 
or greater value) is .05. To find this deviation, first select the 


F Curve 
nj=4 
n 2 =3 



0 — 9.1 IF . 

Fig 38 . —Probability area of .05 in the upper tail of an F curve, m - 4 , m - 3 
[.05 is the P(F ^ 9.12)]. 

row n, = 3, and proceed to the right until the column headed 
ni = 4 is reached. The lightface figure (the adjacent boldface 
figure is the “.01 point”) so located is the deviation desired. 
The “.05 point” is seen to be 9.12. The “.05 point” is shown in 
Fig 38 If the problem were to find the deviation for which the 
tail area or probability of an equal or greater area were just .01, 
then the “.01 point” would have to be used; the rest of the pro¬ 
cedure would be the same. 

Other Curves. Special tables have been derived for the 
computation of areas and probabilities for curves belonging to 
the Gram-Charlier system. These may best be explained, how¬ 
ever, in later sections 1 dealing with testing the goodness of fit 
of a Gram-Charlier curve. It can be remarked here that the 
tables are generally similar to the table of the normal curve. 

The tables of the normal, l, x 2 , and F curves, together with the 
tables used in computing probabilities for a Gi am-Charlier 

1 gee pp. 139-150. 
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curve, comprise the most commonly used tables of probabilities. 
When tables are lacking, the areas or probabilities must be 
computed directly or indirectly from the mathematical equation 
lor the curve. If the equation is a simple one, the area may be 
found by the application of the integral calculus. Often, how¬ 
ever, the equation for frequency curves make it difficult to apply 
the integral calculus, and some approximate method must be 
employed. If an integraph is available, the curve need only 
be plotted carefully and the integraph run around the desired 
area. f an integraph is not at hand, the area may be approxi¬ 
mated from some “quadrature” equation that expresses the 
area of a given mterval in terms of the ordinates of the curve for 
that interval and for the neighboring intervals. 

the more important quadrature equations given by 
W. P. Elderton are as follows: 1 

Area under the curve from x = — l to x = + I 

approximates y 0 - ^(A y_ x - Ay 0 ) (4) 

or, if greater accuracy is desired, a nearer approximation is 
obtained by using 

291 , 17 

ya 5760 ~ A2/ °) + 5^60 (A2/ - 2 “ A 20 (5) 

In these expressions, z = 0 is taken as the middle point of the 
mtervaHor which the area is to be computed and x — — \ and 

x 7 +7 are the values of the lower and upper limits of the inter- 
vai. The units are thus class-interval (d/i) units, and it is in 
these that the area is measured. The symbol y 0 stands for the 
ordinate of the curve at * = 0 , that is, at the mid-point of the 
interval; Ay„ means, the difference between the ordinate y x at 
? ~ 1 y- , the or dinate at the mid-point of the next higher 
interval) and the ordinate y 0 at x = 0; Ay t means the difference 
between the ordinate y 2 at the mid-point of the second next 
higher interval and the ordinate y x at the mid-point of the next 
lgher interval; Ay -.i means the difference between the ordinate 
at .f = 0 and the ordinate at a: = -1 (i.e., the ordinate at the 
mid-point of the next lower interval); and A y- 2 means the 
difference between the ordinate at the mid-point of the next 
lower interval and the ordinate at the mid-point of the second 
1 Frequency Curves and Correlation , pp. 25-26, 48. 
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lower interval. In short, if y*, y h 2 / 0 , V- 1 , and y -2 represent 
the ordinates at x — 2, £ = 1, # — 0? x = —l, and x — 
-2, thenAj/i = 2/2 — 2/n A 2/0 = 2/1 — 2/o, A 2/-1 = 2/o - 2/-i, and 

A2 /—2 = 2/-i “ J/-2. . A , . r , 

To illustrate the use of these equations suppose that ordinates 
of a frequency curve for five values of the variable are as folloy s 
(see Fig. 39): 


X 

y 

105 

1.21 

115 

3.40 

125 

16.10 

135 

47.57 

145 

78.87 


Before proceeding further it is to be noted that the frequency 
curve from which these ordinates have been taken was drawn 
so that the curve would fit a histogram in which the area of an 
interval was represented by the heights of the rectangles and not 
the area. That is, the ordinates are actually one class interval 
(here the class interval is five) times greater than they should be 
if the area under the curve is to be correct. It may also be noted 
that in this case the curve gives the distribution of 300 cases and 
is drawn so that the total area is, not 1, but 300. 

To find the area under the given curve for the interval whose 
mid-point is 125, set up the following table: 


v j 

44) 

V 

Ay 

105 

-2 

1.21 

y^-y- 2 ^ 3.40- 1.21= 2.19 

115 

-1 

3.40 

y 0 _ y_ x = 16.10 - 3.40 = 12.70 

125 

0 

16.10 

yi - Vo = 47.57 - 16.10 = 31.47 

135 

1 

47.57 

Vi -vi = 78.90 — 47.57 = 31.33 

145 

2 

78.90 



This is shown graphically in Fig. 39. 

By using Eq. (4) a first approximation to the area will be 
given by 

16.10 - *(12.70 - 31.47) « 16.10 + .78 = 16.88 

1 Actually, these are the ordinates of a Gram-Charlier curve fitted to the 
weights of 300 Princeton freshmen (see pp. 143-145). It will be interesting 
to compare the area calculated here with that obtained from the tables of 
probabilities for a Gram-Charlier curve (see p. 147). 
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Equation (5) gives as a seeond approximation 1 

1610 _ 5^0 (12 - 70 - 3L47 ) + 5^0 (2 ' 19 - 31 ' 3 °) " 16.96 


Since the original ordinates we 



Fig. 39. —Five ordinates of the 
Gram-Charlier curve fitted to the 
weights of 300 Princeton freshmen. 

a ease falls in the interval 120- 


re one class interval times as large 
as they should be (if the area 
under the curve was to equal 
the total frequency), there is 
no need in this case to multi¬ 
ply the 16.88 or 16.96 by the 
class interval. Hence the final 
answers are 16.88 and 16.96 
cases. If these are divided by 
300 (the total number of cases), 
the results are .0563 and .0565, 
which are the first and second 
approximations to the proba¬ 
bility (relative frequency) that 
130. 


It will be noted that the initial term in each of Eqs. (4) and (5) 
is the ordinate of the curve at the mid-point of the interval for 
which the area is to be calculated. This means that the area 
of the rectangle whose height is the ordinate of the curve at the 
mid-pomt of the interval and whose base is the class interval 
% may be taken as a first approximation to the area under the 
curve over that interval. That is, it is assumed that the curve 
can be roughly approximated for the given interval by the hori¬ 
zontal straight line y = y 0 . The extra terms added in Eq. (4) 
imply that the curve can be better approximated by a second- 
degree parabola drawn through the ordinates of the curve at 
x = x = 0, and. a; = — 1, that is, through the ordinates, y h 
yo, and t/_i. Equation (5) assumes that a still better approxima¬ 
tion to the curve can be obtained by a fourth-degree parabola 
drawn through the ordinates of the curve at x = 2, x = 1, x = 0, 

x = -1, and x -2, that is, through y s , y h y a! y_ h an d yU. 

Eiderton also gives equations based upon these same degrees 
of approximation, that express the area over an interval in terms 


'The area computed later from a table of probabilities for a Gram- 
Charlier curve is 16.98, which shows how good this approximation is. 
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of the ordinates of the curve at the ends instead of the mid¬ 
points of that interval and of neighboring class intervals. 1 


FITTING FREQUENCY CURVES 

To fit a frequency curve to a given set of sample data means 
to determine numerical values for the constants of the curve 
equation. This is usually done by relating the curve constants 
to the various statistics of the data. The problem will now be 
considered with reference to the more important types of curves 
that are fitted to sample data. 

The Normal Curve. When the variable is expressed as an 
absolute deviation from the zero origin, the equation for the 
normal curve is 

"( *-%) 2 1 ( 6 ) 

2d 2 


d -\/2tt 


exp 


where X and d are the mean and standard deviation of the curve. 
It would appear, therefore, that the normal curve could be fitted 
to a set of sample data by merely putting the mean of the data 
and the standard deviation of the data in the curve equation. 
This is essentially the method that is used. A particular adjust¬ 
ment must be made, however, whenever the sample standard 
deviation is computed from grouped data. This adjustment will 
now be discussed. 

When data are grouped for the purpose of calculating a mean, 
standard deviation, or other statistic, a certain aibitrary distor¬ 
tion may result from considering all the cases of a given class 
interval as being all concentrated at the center of the interval 
or, what amounts to the same thing, as being uniformly dis¬ 
tributed throughout the interval. If there were enough cases 
in an interval, it would probably be found that the frequencies 
of the cases in the interval would actually form a smooth curve 
that tended to rise toward the center of the distribution (see 
Tig. 40a) instead of forming a horizontal line as assumed by the 
grouping of the data (see Fig. 406). 

The error that results from grouping is generally negligible 
in the case of odd-powered moments such as the mean, since 
errors in measuring positive * deviations tend to be offset by 
errors in measuring negative deviations. For even powered 


1 See Elderton, W. P., op. cit pp. 25-26, 48. 





132 GENERAL THEORY OF FREQUENCY CURVES 

moments, however, such as the variance, the errors of measure¬ 
ment are cumulative and result in a net error due to grouping. 
W. F. Sheppard 1 has put the error in the variance at T$i 2 where i 
is the size of the class interval. 

Before the normal curve is fitted to a sample histogram, there¬ 
fore, the variance of the sample, when calculated from grouped 
data, must be corrected for grouping by subtracting The 

square root of the corrected variance will give the corrected 


Fig. 40a—Actual distribution of Fig. 406.—Assumed distribution of 

frequencies in a class interval. frequencies in a class interval. 

standard deviation. When the mean of the data and this cor¬ 
rected value for the standard deviation are put in the general 
formula for the normal curve, the result will be a normal curve 
that has the same mean and same standard deviation as the 
given data. 2 

Whether the curve actually fits the data depends on how truly 
normal the data are. The goodness of fit can usually be made 
evident by plotting the normal curve on the same chart as the 
histogram of the data. 3 If the fit or lack of fit is somewhat 
doubtful, various methods may be employed to test the goodness 
of fit. The x 2 test of goodness of fit is discussed below. 4 

Nonnormal Curves. The reason for fitting a normal curve to 
a given set of data is to determine whether the data are or are not 
normally distributed. If the distribution is normal, then the 

1 See Proceedings of the London Mathematical Society, Vol. 29 (1898) p. 353. 

2 In fitting a frequency curve the data are generally very numerous, so 

the multiplication of the sample variance by -j- to improve the estimate 

of the population variance affects the result but little and can therefore be 
neglected, 

3 See pp. 137-139. 

4 See pp. 139-150. 
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normal table can be used to calculate probabilities and the results 
of sampling can be predicted by the carefully worked out sampling 
theory applicable to normal populations. 

When data are distinguished as nonnormal, there may be 
further advantages in fitting a nonnormal curve to the data. 
Such a curve, for example, may serve to “smooth” the histogram 
and may thus permit a more accurate determination of the rela¬ 
tive frequencies of the population from which the sample was 
taken. The identification of the distribution of given data with 
a particular frequency curve may also serve to distinguish them 
from other data the distribution of which is identified with a 
different frequency curve. The “fitting” of frequency curves 
may thus help to classify data as to type. These are the two 
principal reasons for fitting nonnormal frequency curves. 

A Gram-Charlier Curve. The usual type of nonnormal curve 
fitted to data of everyday life is either a Pearsonian curve or a 
Gram-Charlier curve. Since the latter is the easier to fit, it will 


be discussed first. 

In the case of the normal curve, the curve constants were 
themselves commonly computed statistics, viz., the mean and 
standard deviation of the data (the latter corrected for errors 
of grouping). In the case of a Gram-Charlier curve, the general 

frequency equation is 1 

, -(x-ty 

exp | -^- 


V4 “ 3|f! 
~ 1 ! 


1 

(■+ 

A 

fo (* 

-X) 

(X- 

V ~ d \/%r 

d 3 

3 — - 

d 

( 

+ i[- 

6 — 

d 2 

*i 2 + 

(X- 

d 4 

!!•]) 

where 







A - 

= - 

tfs 

3! 

and 

B 


(7) 


Hence the constants of the curve are again functions of commonly 
computed statistics, viz., the mean X, the standard deviation d, 
the third moment and the fourth moment Therefore, 
a Gram-Charlier curve can be fitted to a set of sample data by 
substituting the sample values for the population values X, 
d, v:;, and ui of the curve equation. 

If the sample values of X, a, etc., have been computed from 
grouped data, as is usually the case, then a correction for group- 

1 See pp. 92-99. 
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ing must be made in. the case of the even-power moments <r 2 = n 2 , 
and M4. Sheppard's corrections in these instances are 

M 2 = M 2 (uncorrected) — rz(i) 2 ) 

m = it4 (unoorrected) - (uncorrected) + vi v (i) 4 f (8 ^ 

where the y’s stand for the corrected moments and for the 
uncorrected moments. 

The goodness of fit of the curve to the data may be examined 
by simply making a graphic comparison of the curve and sample 
histogram to which it is fitted, or a x 2 test may be undertaken. 
These tests of goodness of fit are described and illustrated below. 

A Pearsonian Curve . Karl Pearson, it will be recalled, 1 found 
that frequency curves in general might be represented by an 
equation of the type 


Relative slope = =— a 

b 0 + b\X + b 2 x 2 


(9) 


The logical basis upon which this system of curves rests was dis¬ 
cussed m Chap. IV. It is the purpose here to describe the fitting 
of such a curve. 


The first step is to relate the constants a, bo, b h and b 2 of Eq. 
(9) to various statistics computed from the sample data. The 
method by which this is accomplished is explained by W. P. 
Elderton in his Frequency Curves and Correlation (pages 39 to 40). 
The essence of the procedure is to determine a, bo, 6i, and b 2 so 
that the first four moments of the fitted curve will be the first 
four moments of the equation. The algebra is elaborate and 
will not be repeated here. When the results obtained by Elder- 
ton's analysis are substituted in Eq. (9), it becomes 2 


Relative slope = 


x I V / V2 \/?l (?2 + 3) 

_~2?5g2 ~ 6g! - 9) __ 

t»2(4g 2 - 3gi) + vV Y $1 (ga + 3)x + (2g 2 - 3gi - 6)z 2 ^ 

2(5g 2 - 6gi - 9) 

1 See Chap. IV (p. 57). 

2 Elderton’s method involves the assumption that x n (b 0 + b 2 x + b 2 x‘Cy 
vanishes at the ends of the range of the distribution, i.e., that there is close 
contact with the x-axis at both ends. 

It will be noted that the mode of the curve is that point at which the 
relative slope is zero. Equation (10) thus shows that the mode of a Pear¬ 
sonian curve comes at 
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where x — X — X, li = yl/yl an( l = m/yl> Vis 1° 

have the same sign as ^ 3 . 

The Pearsonian equation is now expressed in terms of , the 
moments and the 0 coefficients of the curve and the equation can 
be fitted by substituting the moments of the sample data for the 
curve moments. The values of the second and fourth moments 
must again be adjusted for Sheppard's corrections if they have 
originally been computed from grouped data [see Eqs. (8)]. 

The Pearsonian equation (9), it will be recalled, 1 yields differ¬ 
ent types of curves, depending on the value of the criterion 
K = by^bobz. When the b’s are given their values from Eq. (10), 
this criterion becomes 2 

_ _ Ufo + 3) 2 _ (n) 

K 4(402 - 30i)( 202 - 30i - 6) 

The types of curves distinguished on the basis of this equation 
are as follows: 3 


Value of criterion 


Curve type 


K = 0, 0! = 0, 02 > 3 

K = 0, 01 — 0, 02 = 3 

k = 0, 0i = 0, 02 < 3 

K = 0, 01 = 0, 02 < 1.8 

0 < K < 1 

K = 1 
1 < K < 00 

k ~ oo, that is, 202 — 3 

K < 0 


VII 

Normal curve 
Ila 
115 

IV 

V 

VI 
III 
I 


_ ~ V'^2 V / 01 (02 + 3) 

* “ 2(502 - 601 - 9) ; 

Since x = X — X and __ 

v a/01 (02 + 3) 

Mo “ * " 2(502 - 601 - 9) 

where -\/(h i s to be given the same sign as y3* Since skewness can be 

. , X - Mo 


measured by 


+ a/ 0i (02 d~ 3) 
2(502 - 60i - 9) 


1 See Chap. IV (p. 57). 

2 See Elderton, op. cit ., p. 42. 


Taken from Tables for Statisticians and Biometricians , Part I, p. lxiii. 
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It is clear at once from Fig. 41 what type of curve is yielded 
by usual values of (Ji and (5 2 ; it is unnecessary in most instances 
to go to the trouble of calculating the criterion value k. * The 
reader should be warned, however, that, if sample values of 



0i an( * 02 are near a border line, there may be a good chance, owing 
to sampling errors, that the population data are of the type 
lyi n g or on the other side of the border rather than of the 
type indicated by the sample /Ts themselves. 

Thus the type of Pearsonian curve that fits a given set of data 
may be found immediately by substitution of the corrected {3 

This chart also reveals in a striking fashion the more or less arbitrary or, 
perhaps better, technical mathematical basis for the Pearsonian classification 
of curves. As developed in Chap. IV, the logical classification is simply 
the normal curve, the type III curve, and the others. Cf. Tables for Statisti¬ 
cians & Biometricians, Part I, p. lxi. 
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coefficients in the criterion formula or by the location of the 
values on Fig. 41. The equation for the curve (in differential 
form) will be obtained by substituting the corrected moments 
and fi coefficients in Eq. (10), as was indicated above. Since 
this is a differential equation that gives the relative slope of a 
curve at any point and not the actual ordinate, it is necessary to 
derive an equation for the ordinates of a curve before the curve 
can be graphed. This problem will be discussed more fully m 
the next section on testing goodness of fit. 

Fitting Sampling Curves. Sometimes laboratory experiments 
are undertaken to test certain sampling theories. These yield 
empirical sampling distributions. Sampling theory may sug¬ 
gest that these empirical distributions should conform to certain 
theoretical sampling distributions. To test this theory or possi¬ 
bly just to see how well the empirical distribution is approximated 
by some well-known theoretical distribution, a particular sam¬ 
pling curve is fitted to the empirical distribution. In such 
instances the process of fitting is relatively simple. To fit the t 
curve x 2 curve, or F curve it is necessary merely to determine 
the proper values for n or m and « 2 and then plot the curve as 
described in the first section. The goodness of fit can be deter¬ 
mined by graphic comparison or by a x 2 test such as is described 
in the next section. 


testing goodness of fit 

Testing a Normal Curve. Two methods of testing whether a, 
normal curve fits a given set of sample data will be described 
here These will be simple graphic comparison and the X test. 
Other methods of testing for normality are described m Chap. 


XVI 

Graphic Comparison. In the first section of this chapter the 
graphing of a standard normal curve was explained- This con¬ 
sisted merely in plotting the ordinates of the standard curve at 
selected values of x/«. The problem now will be to adjust the 
abscissa scale and the curve ordinates so that the curve will fit 
a histogram with a given mean and a given standard deviation. 

The process is as follows: .. , 

First find what the mid-points of the histogram class intervals 
are in terms of standard deviation units measured from the mean 


1 See pp. 137-152. 
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as an origin. This can be done by subtracting the mean from 
each of the mid-values and dividing these deviations by the 
(corrected) standard deviation. For the values of x/6 so 
obtained, the ordinates of the standard normal curve may be 
found from Table VI in the Appendix. If the histogram with 
which the normal curve is to be compared is such that relative 
frequencies are measured by the areas of the rectangles erected 
on each interval, then the ordinates of the standard curve should 
be divided by <7 to allow for the fact that the abscissa scale for 
the histogram is in absolute units while the abscissa scale for the 
standard curve is in standard deviation units. 1 If the histogram 
is such that absolute frequencies are measured by the heights of 
the rectangles erected on each interval, then the ordinates of the 
standard curve must be multiplied by Ni/a, where i is the size 
of the class interval and N the number of cases. 2 3 When the 


Table 17a. Calculation of the Ordinates of the Normal Curve 
That Fits the Distribution of Heights of 300 Princeton Freshmen 


CD 


x 


62.5 

63.5 

64.5 

65.5 

66.5 

67.5 

68.5 

69.5 

70.5 

71.5 

72.5 

73.5 

74.5 

75.5 

76.5 

77.5 


(2) 

(3) 

(4) 

(5) 

X - X = x 

X - X 

Ordinate of 
standard curve 

Col. (4) X — 

<T 

-7.97 

-3.22 

0.00224 

0.27 

-6.97 

-2.82 

0.00748 

0.91 

-5.97 

-2.42 

0.02134 

2.59 

-4.97 

-2.01 

0.05292 

6.43 

— 3.97 

-1.59 

0.11270 

13.69 

r— 2.97 

-1.19 

0.19652 

23.87 

-1.97 

-0.80 

0.28969 

35.19 

-0,97 

-0.39 

0.36973 

44.91 

0.03 

-0.01 

0.39892 

48.45 

1.03 

0.42 

0.36526 

44.36 

2.03 

0.82 

0.28504 

34.62 

3.03 

1.22 

0.18954 

23.02 

4.03 

1.63 

0.10567 

12.83 

5.03 

2.04 

0.04980 

6.05 

6.03 

2.44 

0.02033 

2.47 

7.03 

2.84 

0.00707 

0.86 


A — 70.47 <r (corrected) = 2.47 


1 The area of the histogram in this case is one regular unit while the area 
of the standard curve is one standard deviation unit. To make the two 

equal, the ordinates of the standard normal curve must all be divided by <r 

3 The area of this histogram is XFi = Ni. Hence, after the ordinates 
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standard ordinates have been properly adjusted, they can be 
plotted on the same graph as the given histogram and the good¬ 
ness of fit may be examined visually. Fits that are obvious y 
ff ood and bad may be determined in this way. 



The whole procedure of graphically comparing a histogram 
with a fitted normal curve is illustrated by Table 17a, and the 
result obtained is shown in Fig. 42a. For this example, the fit is 
clearly a good one. It must be concluded, therefore, that the 
heights of young men of approximately the same age are nor¬ 
mally distributed. 

The x 2 Test. When it is difficult to tell by graphic comparison 
whether a normal curve is a good fit to a given set of data, some 
more refined method must be undertaken. One such method 
is the x 2 test, so called because it makes use of the x 2 distribution 
in determining the goodness of fit. The essence of the test is a 
numerical comparison of the frequencies of the curve and histo¬ 
gram, interval by interval. The procedure is as follows: . 

To calculate the curve probabilities for each interval it is first 
necessary to express the limits of these intervals in terms ol 
standard deviation units measured from the mean as an origin. 
Thus x/6 for each class limit can be found by subtracting the 


have been adjusted for the difference in abscissa scales by division bj <r, as 
explained above, they must be multiplied by Ni to take account of the fact 
that the area of this second form of the histogram is Ni absolute units ana 
not one unit. 
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mean from the class limit and dividing by the corrected standard 
deviation. The process is illustrated in columns (1), (2) and (3) 
of Table 17 b. ’ K h W 

The next step is to find from the area table for the normal 
curve the probabilities, or relative frequencies, of cases lying 
between the mean and the various class limits. Then the rela¬ 
tive frequencies, or probabilities, for each interval may be found 
by a process of subtraction, and these may be converted to 
absolute frequencies by multiplying by the number of cases N. 
This step is illustrated in columns (4), (5), and (6) of Table 17 b. 

If it should turn out that the curve frequency for any interval 
is less than 5, it should be combined with a neighboring interval 
or intervals so that the frequency of every interval is at least 5. 
Such adjustments are almost always necessary at the ends of the 
curve. In this connection it should be pointed out that the end 
intervals should always be taken as running to ± °o, so that the 
total frequency for the curve will be the same as that for the 
histogram. 


The final step is to compute the quantity 


/ 


for each 


interval and then sum for all intervals. Here / stands for the 
curve frequency and F for the histogram frequency. The value 


As 


S? (F - /) 2 

°f 7 is the final criterion of goodness of fit. 

explained, more fully below, 1 if normal curves are fitted to many 
sample histograms from a truly normal population, the various 


sample values of ^ wi u tend to fonn a sampling dig . 

tribution that is of the form of a x 2 distribution with n equal to 
-the number of class intervals for which comparisons are made 
(a combined interval is treated as a single interval), minus 3 
Hence, a selected point of the x 2 table for the proper value of n, 

say the .05 point, will serve as a critical value for ^ ^ ~ ^ 

For if the population from which the sample is taken is truly 
normal, there are only 5 chances out of a 100 that the value of 

2 (F - /) 2 

J~~ Wl11 e 0 ual or exceed the .05 x 2 value. Hence, values 


1 See pp . 331-333. 
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general theory of frequency curves 


,V ( F -f ) 2 

01 /_/ f greater than this .05 value suggest that the popu¬ 
lation is not truly normal; they are an index of bad fit. 


An illustration of the procedure for calculating iZ ~_D1 ig 

given in Table 176. The value of J there obtained 

supports the graphic analysis in showing that the normal curve 
is a good fit to the heights of young college men. The x 2 table 
shows that for n = 11—3 (f.e., the number of class intervals 1 less 
three), the probability of as great a value as 3.867 is between 
.80 ^nd .90, which indicates a good fit. 

A Gram-Charlier Curve.* The goodness of fit of a Gram- 
Charlier curve may also be tested by graphic comparison and 
by a x 2 test. These will now be described. 

Graphic Comparison. The graphing of a Gram-Charlier'curve 
is almost as easy as the graphing of a normal curve. Equation 
(7) shows that the ordinates of a Gram-Charlier curve are equal 
to the normal ordinates plus two different multiples of these 


ordinates, viz . 


V3 

3!d : 


rdx _ x*\ 
\ d d 3 / 


ttt ) and 


^4_3 

4!d 4 


( 3 -?+£)■ 


Graphing of the Gram-Charlier ordinates is facilitated by use of 
tables of the normal ordinate [called <po{x/6)], the normal ordinate 

[called <p s (x/6)], and the normal ordinate times 


(ZX _ 

\<*2 dg / 


times 

^ ^2 + [called ipi(x/$)}. These values will be found in 

the Appendix, Table VI. 

To get the ordinates of a particular Gram-Charlier curve it is 
necessary merely to multiply the values given for <p s (x/6) by 

- 0 and the values given for Vi (x/i) by and to add 

the algebraic sum of these (or to subtract if the sum is negative) 
to the values given for v«{x/d).* When the ordinates so com¬ 
puted are multiplied by Ni/a, they will become immediately 
comparable with the sample histogram from which the moments 
were computed and the two may be plotted on the same graph. 


1 See p. 333 for discussion of the basis for selecting n. 

* If ju 2 and [a 4 have been calculated from grouped data, they must be 
adjusted for Sheppard’s corrections before substituting in these equations 
(see p. 134). 
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To illustrate the fitting of a Gram-Charlier curve and the 
graphing of its ordinates, consider the distribution of weights 
of the 300 Princeton freshmen, shown in Fig. 425. The mean of 
this distribution is 151.8 pounds, its standard deviation is 17.8 
pounds, and the values of M 2 , M 3 , and m 4 for this distribution are 
,u 2 = 318.094, ms = 3,566.48, and M 4 = 472,727, Sheppard’s cor- 



Fig. 426.—A Gram-Charlier curve and a Pearsonian curve fitted to the distribu¬ 
tion of weights of 300 Princeton freshmen. 


rections for grouping being applied in all cases. The equation for 
the Gram-Charlier curve that fits this distribution is thus 
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The steps taken to obtain the ordinates of this curve at the 
mid-points of various class intervals are indicated in Table 18. 
Thus in column (1) are written the values of the mid-points of 
the intervals, in column (2) their deviation from the mean of the 
distribution, and in column (3) the measure of these deviations 
in standard deviation units. In columns (4), (5), and (6) are 
entered the values of <p 0 (x/6), <p 3 (x/d) y and cf A (x/d) given in 
Table VI of the Appendix 1 for the values of x/6 listed in column 
(3). In column (7) is entered the product of — n 3 /6a z (= —. 1048) 

times column (5) and in column (8) the product of 

24a- 4 

( = .0697) times column (6). Column (9) contains the algebraic 
sums of the items of columns (4), (7), and (8) ; and in column (10) 
these sums are multiplied by Ni/v = 168.2. These final figures 
are the ones in plotted Fig. 426. Column (11) contains the 
ordinates of the histogram. 

Table 18 . Computation of the Ordinates of the Gram-Charlier 
Curve That Fits the Distribution of Weights of 300 Princeton 

Freshmen 



The x 2 Test. The x 2 test of goodness of fit of a Gram-Charlier 
curve is the same as the x 2 test for a normal curve. Frequencies 

1 It is to be noted that for negative values of x/6 the signs of <f S (x /d) entries 
in the table must be reversed. 
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of curve and histogram are compared interval by interval, and 

the value 2 i s computed and compared with the .05 

point of a x 2 table. The procedure will be illustrated .with, 
reference to the distribution of weights discussed in the previous 
section. 

Before the procedure itself is described, a brief explanation of 
how to compute relative frequencies or probabilities (areas) for 
a Gram-Charlier curve is in order. To calculate the area under 
a Gram-Charlier curve for a given range of x/6 values it is neces¬ 
sary first to compute the normal area and then adjust for the 
effects of the <p*(x/d) and <pa(x/6) terms. The adjustment 
required by the <ps(x/d) term is given by the product of three 

1 , the ordinate of the normal curve for #/d, and 


factors, 


x* 


^ 3 /6d 3 „ The adjustment required by the <pi(x/d) term is 


Thus to get the area under a 


simply times ^ 4 ^ 

Gram-Charlier curve from- co to x/6 write down the area given 
by Table VI of the Appendix [this table gives the area between 
the mean 0 and x/6\ to find the area from — cc to a:/d, subtract 
the table area from .5 if x/6 is negative, or add it to .5 if x/6 is 

positive], and add to it algebraically the value of ^ — 1^ 


To 


Q ] ( c<^ 3 ) plus the value of 24<W ? ) 

calculate the area between any two values of #/d, calculate the 
areas from — oo to each of these values, and take the difference. 

Table 19 illustrates the procedure just outlined by showing 
how the areas under the Gram-Charlier curve fitted to the 
weights of the 300 Princeton freshmen are. determined for various 
class intervals. Thus columns (1) to (3) convert the original 
class limits into x/6 deviations from the mean. Column (4) 
gives the areas under the normal curve from — co to the upper 
limit of each class interval, now measured in x/6 units. Columns 
(5), (6), (7), and (8) show the computation of the adjustment 
required in each case by the <p*(x/6) term [the final adjustment 
here is entered in column (8)], and columns (9) and (10) show 
the computation of the adjustment required by the <p±{x/d) term 
[final adjustment given in column (10)]. Column (11) is the sum 
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S'- 


This is found 


of the items of columns (4), (8), and (10) and represents the final 
areas from — oo to the given values of x/d. In column (12) the 
areas for the individual class intervals are computed by taking 
the successive differences between the items in column (11). 
These figures give'the proportion of the total area contained in 
each interval. As indicated in the headings of the table, the 
sample values of m 2 , Ms, and /u are taken to be the parameter 
values, so the areas computed refer to the fitted curve. 

The remaining part of Table 19 is concerned with testing the 
goodness of fit of the curve to the given histogram. Thus in 
column (13) the relative frequencies given in column (12) (areas 
under a frequency curve, it will be recalled, measure relative 
frequencies) are multiplied by N = 300 to give absolute fre¬ 
quencies that are comparable with the frequencies of the sample 
histogram. Then columns (14) to (17) carry out the calculations 

necessary to compute the value of T—jL-. This is found 

to be 11.6675. A % 2 table shows that for n = 10 — 5 (i.e., the 
number of class intervals 1 minus 5) the probability of as great a 
value as 11.6675 is between .02 and .05, which does not indicate a 
very good fit. Apparently the Gram-Charlier type curve is not 
flexible enough in this instance to adjust itself to the sharp 
contours of the histogram. It is possible that a Pearsonian type 
curve might give better results; this will be examined in the 
next section. 

A Pearsonian Curve. Graphic comparison and the % 2 test 
can be used to test the goodness of fit of a Pearsonian curve as 
well as a normal or Gram-Charlier curve. The following sec¬ 
tion will thus be devoted to the problems peculiar to a Pearson¬ 
ian curve and avoid as far as possible duplication of previous 
discussion. 

Finding the Curve Equation. The Pearsonian equation (9), 
as noted above, is not an equation for the frequency curve itself, 
but one for its relative slope. 2 The former can be readily 
obtained from the latter, however, by the use of the integral 
calculus. Although it is possible in any practical case to sub¬ 
stitute the computed values of the moments in Eq. (9) and then 
find the frequency curve to which it gives rise, it is generally 

1 See p. 333 for a fuller discussion of the basis for selecting n. 

2 See p. 57. 



Table 19. —Test of Goodness of Fit of a Gram-Charlier Curve to the Distribution of Weights of 300 Princeton 

Freshmen 1 
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1 

g 


.1005 

.5371 

.4165 

3.3957 

.0068 

.6575 

5.0337 

.6421 

.4007 

.4769 

2) (P - /)2 - 11.6675 

S 

1 

g 

.6241 

9.1204 

19 8025 
262.1161 
5041 
27 4576 
80 6404 
5 3361 
2.6569 
2.6896 

S' 

1 

fe. 

05 C7| VO CVWtK OO r-H ' CO ^ 

CO^O ui 00 im' rt H 

+ + + 7 i +. + i i i 

s 

fe- 

NOWrtMNiOOlfJ ^ 
vO <0 !>■ «*< 03 

CO 

li 

6.21 

16.98 

47.55 

77.19; 

73.71 

41.76 

16.02 

8.31 

6.63 

5.64 

s 

First 

differences 
of (11) 

, 

.0207 

.0566 

.1585 

.2573 

.2547 

.1392 

.0534 

.0277 

.0221 

.0188 


Sum of (4), 
(8), and 
(10) 

.0207 

.0773 

.2358 

.4931 

.7388 

.8780 

.9314 

.9591 

.9812 

1.0000 

o 

J. 

1 

a 

£ 

$ 

<NI 

+ .0017 
-.0243 
-.0378 
-.0083 
+ .0320 
+.0330 
+ .0064 
-.0095 
-.0084 

0 

s 

s 

+ .0242 
-.3494 
-.5426 
-.1187 
+ .4599 
+ .4735 
+ .0912 
-.1364 
-.1207 
0 . 

ST 

i\% 

1 l«o 

-.0186 
-.0097: 
+.0196 
+.0412 
+ .0296 
-.0010 
-.0179 
-.0152 

-.0069 

0 

g 

7 

.1776 

.0926 

-.1811 

-.3930 

-2829 

.0096 

.1713 

.1450 

.0654 

0 

S 

^b 

t 

.0819 

.1896 

.3209 

.3970 

.3588 

.2372 

.1145 

.0405 

.0104 

. 


s 

I 

“h 1% 

2.1684 

.4884 

-.5644 

-.9900, 

-,7884i 

.0404! 

1.4964 

3.5796 

6.2900 


s 

Area under 
normal 

curve 

. * 
— 00 to ~ 

9 

.0376 

.1113 

.2546 

.4602 

.6772 

.8460 

.9429 

.9838 

.9965 

1.0000 

s 

1 

H 

b 

-1.78 

-1.22 

— .66 

- .10 
.46 
1.02 
1.58 
2.14 
2.70 

s 

1 

—31.8 
-21,8 
-11.8 
- 1.8 
8.2 

18.2 

28.2 

38.2 

48.2 

8 

* I OOOOOOOOO 


1 Several class intervals have been combined at the ends so that the curve frequencies in these intervals will not be too small (see p. 140 and fuller discussion on pp. 309, 326). 
* Upper limits of class intervals. 
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easier to carry out the transformation (integration) first and then 
substitute the values of the moments in the resulting equation. 
The details of the process of integration and the development 
of curve equations for the various types of curves is explained in 
detail in W. P/Elderton, Frequency Curves and Correlation. 
Their use will again be illustrated by reference to the data on the 
weights of 300 Princeton freshmen. The g’s and fts for these 
data are y 2 = 318.094, = 3,566.48, = 472,727, ft = .4063, 

and ft = 4.6720. On substitution in Eq. (12), it is found that 
the criterion of curve type k equals .161, which indicates a type 
IV curve. This is also indicated by Fig. 41. 

If the integration is carried out, it will be found that the 
formula for a type IV curve is 1 



m = -J(r + 2) 

' v = Kr - 2) 

\/l 6(r — 1) — gi(r — 2) 2 

a = yjjfi Vl6o - 1) - &!(r - 2) 2 
N 

2/0 aF(r,v) 

F(r,v) is a special function of r and v 

and the origin is at the mean plus va/r. An alternative equation 
is 2 

y = y Q cos r+2 6e- v0 (13) 

F 

where 6 = tan -1 — and 6 in e~ :6 is measured in radians. The 
a 

function F(r,v) may be evaluated for various values of r and v 
from Karl Pearson’s Tables for Statisticians and Biometricians , 
Table LIV, explained on pages lxxxi to Ixxxiii. 

Equation (13) is the easier form to employ in plotting ordinates 
although it does not give ordinates at equal x intervals. If the 

1 See Elderton, op. cit ., p. 64. 

2 See ibid., p. 65. 
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latter is essential, as it might be if the ordinates were to be used 
to compute areas, then Eq. (12) must be used. Equation (12) 
will be employed here. The calculations required for determin¬ 
ing the curve equations are as follows: 

Since jSj = .4063 and ft = 4.6720, then, on substituting these 
sample values for the population parameters which they serve to 
estimate, it follows that 

6(4.6720 — . 4063 1) _ g 2204 

r = 2(46720) - 3(.4063) - 6 
to = -£(9.2204 + 2) = 6.6102 

9.2204(9.2204 - 2 ) yUOfiS _ 

V ~ Vl^-2204 - 1) - .4063(9.2204 - 2) 2 
= 4.0398 

1 318.094 a / 16( '9_2204 - 1) - .4063(9.22042) 2 
\ 16 
= 46.8374 

To find the value of F(r,v) to be used in computing y 0 , resort 
must be had to Karl Pearson’s Tables for Statisticians Table 
LIV. The first step is to find <f>, defined by the relationship 
tan <p = v/r. For the given data, 

tan * = oSS - - 43814 and ^ = 23 - 66 ° 5 ° 

For <p = 23° and r = 9, Pearson's' Tables give log F(r,v) = 
9.2186422 - 10; for q> = 24°, r = 9, log F(r,v) = 9.2488237 

- 10. Hence, for <p = 23.6605° and r = 9, the value of logF 
(r,v) is, by straight-line interpolation, 9.2385650 — 10. Again, 
for — 23° and r = 10, log F(r,v) = 9.2347504 - 10; for 
<p = 24° and r - 10, log F(r,v) = 9.2686087 - 10, Interpola¬ 
tion gives, for <p = 23.6605° and r = 10, log F(r,v) = 9.2571003 

— 10. Finally, interpolation between 9.238565 — 10 and 
9.2571003 — 10 gives, for (p = 23.6605° and r — 9.2209, log 
F(r,v) — 9.2426502 - 10 = -.7573498. Consequently, 

log y Q = log 300 - log 46.8374 - (-.7573498) = 1.5638783, 

and yo = 36.63. The formula for the curve is thus 


a:' 2 

1 + (ifi^W 


-4.040 tan-i 

e 


x' 

46784 


y - 36.63 


—5.610 
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4 0398 

the origin being at g 1 ^)! '(46.8374) plus the mean or at 
20.5 + 151.8 = 172.3. 


Table 21.—Computation of Areas from Ordinates of Table 20 


X 

V 

Ay 

Area or frequency 

105* 

3.3 

7.7 - 3.3 = 4.4 

3.3 

115 

7.7 

17.3 - 7.3 = 10.0 

7.7 -M 4.4-10.0)= 7.7+ .2= 7.9 

125 

17.3 

35.2 - 17.3 = 17.9 

17,3 - *( 10,0 - 17.9) = 17.3 + .3 = 17.6 

135 

35.2 

59.9 - 35.2 = 24.7 

35.2 -*( 17.9 - 24.7) = 35.2 + .3 = 35.5 

145 

59.9 

74.6 — 59.9 — 14.7 

59.9 - *( 24.7 - 14.7) = 59.9 - .4 = 59.5 

155 

74.6 

59.8 - 74.6 = -14.8 

74.6 - *( 14.7 + 14.8) = 74.6 - 1.2 = 73.4 

165 

59.8 

28.5 - 59.8 = -31.3 

59.8 - *(-14.8 + 31.3) = 59.8 - .7 = 59.1 

175 

28.5 

8.4 - 28,5 = -20.1 

28.5 - *(-31.3 + 20.1) = 28.5 + .5 = 29.0 

185 

8.4 

1.8 - 8.4 = - 6.6 

8.4 -*(-20.1+ 6.6)= 8.4+ .6= 9.0 

195 

1.8 

.3 - 1.8 = - 1.5 

1.8 -*(-6.6+ 1.5)= 1.8+ .2= 2.0 

205 

.3 

. ! — .3 = - 0.2 

.3 - *(- 1.5 + .2) = .3 + .1 = 0.4 

215* 

. 1 


0.1 


* Areas of these intervals are taken equal to ordinates. 


Table 22.—The x 2 Test of Goodness of Fit of the Curve 


V 

F 

/ 

F - / 

(F - /) 2 

(F - /+ 

/ 

Below 110. 

' 2 

6.5* 

-4.5 

20.25 

3.12 

no-. 

5 

7.9 

-2.9 

8.41 

1.06 

120-.. • • 

20 

17.6 

2.4 

5.76 

.33 

130-.... 

52 

35.5 

16.5 

272.25 

7.67 

140- .. . 

61 

59.5 

1.5 

2.25 

.38 

150- . 

73 

73.4 

-.4 

.16 


160-.... 

+.,.47 

59.1 

-12.1 

146.41 

2.48 

170-.... ,,,,, ... 

25 

29.0 

-4.0 

16.00 

.55 

180 and above t. 

15 

11.5 

3.5 

12.25 

1.06 

Sum 

300 

300 



16.65 


* Taken to make total area equal to 300, it being assumed that the area above 220 is zero. 


t Consolidated so that total area for interval of. comparison is at least equal to 5. 

Graphic Comparison. From this equation the ordinates of 
the frequency curve y may be computed as indicated in Table 20. 
It will be noted that the x r values are taken as the mid-points 
of the various class intervals. The final results are to be found 
in column (10); the ordinates of the histogram to which the curve 
is fitted are given in column (11) for the sake of comparison. 
The curve and histogram are graphed in Figure 42 h. 
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The x 2 Test of Goodness of Fit. To test the goodness of fit the 
areas under the curve for the various class intervals may be com¬ 
puted from the quadrature equation (4) or (5).* This has been 
done in Table 21 by using Eq. (4). The final results are com¬ 
pared with the histogram frequencies in Table 22 and a x 2 test 
of goodness of fit carried out. This table yields 


2 2 


- = 16.14. 


Forn = 9 — 5 = 4,f the .01 point of a x 2 distribution is 13.277, 
which indicates that for the case in question the probability of a 
(F — f)2 

value of 2^ - j - equal to or greater than 16.65 is less than 

.01. Hence the fit is not an especially good one. Apparently, 
owing to the sharp peak in the center, the given data cannot be 
well fitted by a smooth frequency curve. 


equal to or greater than 16.65 is less than 


* See p. 128. 

f For explanation, see p. 333. 







PART II 


Elementary Theory of Random Sampling 

CHAPTER VIII 

A PREVIEW OF SAMPLING THEORY 

An important part of statistical theory is concerned with the 
logical basis of inferences about a population, i.e., about a large 
set of cases, from which a sample has been taken. This is called 
the “theory of sampling.” 

In many instances a study of the whole population is imprac¬ 
ticable, if not impossible. The set of children born to members 
of the white race goes back to the dim past and, so far as we know, 
will continue into the unknown future. For all practical pur¬ 
poses it is an infinite population of children. Any study of ' 
this population, for example, a study of the percentage of male 
and female births, must be made from a sample. Again, the 
registered voters in the United States form a population of many 
millions. If an institute of public opinion wishes to discover 
the trend of political sentiment in the country, it might con¬ 
ceivably approach every one of these millions of voters and ask 
him what party or candidate he favors. Such a “straw vote,” 
however, would necessitate considerable preliminary preparation 
and would be very costly, and the tabulation of returns would be 
an elaborate clerical and statistical problem; only the United 
States Bureau of the Census could venture to undertake such a 
job. A small private institute must necessarily determine public 
opinion from a carefully selected sample. * 

For similar reasons, sampling is employed m many other fields. 

It is used to study qualities of manufactuied products, yields of 
various agricultural techniques, results of different medical 
treatments, effects of suggested educational methods, and the 
like. Sampling analysis is not confined to human populations 
but may be applied to the study of populations of inanimate 
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objects, plants, and animals; it has general applicability. Statis¬ 
ticians often speak of a “universe” instead of a “population”; 
the two words are interchangeable in sampling analysis. 

RANDOM SAMPLING 

Random Sampling and Probability. If a sample of data is 
obtained in a manner that may be characterized as “random,” 
it is possible to make certain inferences about the population 
from which the sample has been drawn. With respect to a 
random sample the calculus of probability and the theory of 
frequency curves developed in previous chapters may be applied 
with a reasonable degree of accuracy. In random sampling, 
for example, it is possible to compute the probability of wrong¬ 
fully rejecting a given hypothesis regarding the population. It 
is also possible to calculate ranges of values that may be stated 
to cover the actual population values with a given probability. 1 
The randomness of a sample accordingly affords an essential basis 
upon which inferences regarding the population may be logically 
based. 

Sampling that is purposive and not random will be discussed 
briefly at the end of this chapter. Samples obtained by this 
method may be “thought” to be good representations of the 
population, but just how good is indeterminate; nor can it be 
determined by the use of this method how often incorrect 
inferences regarding the population may occur. Random 
sampling is the only method so far devised that permits logical 
inferences about a given population. 2 

Types of Populations. Before discussing the technique of 
random sampling it is desirable to distinguish several different 
types of populations. A population may be either “existent” 
or “hypothetical.” The registered voters in the United States, 
the stock of machine parts in a given storeroom, the wool sheep 
on a given ranch are all existent populations. The set of heads 

1 Testing $f hypotheses and determination of “confidence intervals” are 
discussed briefly in a subsequent section of this chapter (see pp. 163-174) 
and developed more fully in the chapters that follow. 

2 It must be admitted, however, that confidence in an inference based on a 
random sample is dependent on the “thought” or firm belief that it is a 
truly random one. Whether thought with respect to randomness is any 
sounder, as a basis for inferences, than thought with respect to representative¬ 
ness of a sample obtained by some other method is a debatable question. 
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and tails to be obtained by the indefinite tossing of a given coin, 
the children that have been and will be born to members of the 
white race, the results of the repeated performance of a given 
physical, biological, or other type of scientific experiment are all 
hypothetical populations. In some cases an existent population 
is part of a larger hypothetical population; for example, the 
machine parts in a storeroom, which of themselves form an 
existent population, may be viewed as a portion of the hypo¬ 
thetical population of parts that would be produced by the 
indefinite continuation of the given manufacturing process. 
Again, hypothetical populations may sometimes be the products 
of random selection from given existent populations. The balls 
in a bag, for example, form an existent population, but the balls 
that might be drawn from the bag with replacements would 
form a hypothetical population. 

Populations may also be classed as “finite” or “infinite.” 
The body of registered voters in the United States is a finite 
population, although it is so large that for some purposes it may 
be considered infinite. The set of heads and tails obtained by 
the endless tossing of a coin, the continuous births of white 
children, the results of an indefinite repetition of any physical 
or biological process—all constitute infinite populations. It 
is possible, of course, for a hypothetical population to be finite 
or infinite. The first 100 heads and tails to be obtained from 
the tossing of a coin is a finite hypothetical population. Existent 
populations, however, are almost universally finite. 1 

Technique of Random Sampling. As indicated in the chapter 
on probability, randomness is largely a matter of intuition. 
Probability theory considers the set of all possible different 
samples and derives their distribution according to some criterion. 
If probability theory is to be used in predicting the results of any 
method of sampling, the method should be such that, if repeated 
a large number of times, it will tend to yield all possible different 
samples with equal frequency. Such a method is called a 
“random” method. 

1 The foregoing classification of populations follows closely that of G. 
XJdny Yule and M. G. Kendall, Introduction to the Theory of Statistics, pp. 
332-334. Also, see M. G. Kendall and B. Babington Smith, “Randomness 
and Random Sampling Numbers,” Journal of the Royal Statistical Society, 
Vol. 101 (1938), pp. 147-166. 
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General notions regarding the concept of probability suggest 
that, if every member of the population is given an equal chance 
of being selected, or, to put it another way, if the selection of any 
member of the population is independent of the attribute it 
assumes, the results may be those desired. There is no definite 
assurance, however, that any special technique will conform to 
these criteria. All that can be done, and all that has been done, 
is to apply to a known population some apparently random 
method and to compare the results obtained with those expected 
upon the basis of probability theory. If the actual and theoret¬ 
ical results agree reasonably well when applied to the known 
population, it is concluded that the given method will also yield 
random samples when applied to an unknown population. The 
danger exists, nevertheless, that a method of selection yielding 
random samples in one instance may not give random samples 
in another instance. In the final analysis, belief that a particular 
method will produce random results rests on intuition guided 
by past experience. Some of the methods that have been 
devised to obtain random samples are ordinal selection, mechan¬ 
ical randomizing devices, tables of numbers, random sampling 
numbers, and natural selection. 

Ordinal Selection. The methods of selection described in this 
and the next three sections are methods devised to obtain random 
samples from finite populations. They are frequently used in 
the social sciences, since the sampled populations in these fields 
are often finite. 

If a given population consists of a list of objects in which the 
attribute of an object is independent of its position on the list, 
members of the population selected by some ordinal position 
(every tenth, every twentieth, or every tenth position per page) 
will be a random sample for the purpose of studying the given 
attribute. For example, a list of students alphabetically 
arranged in a college directory would presumably be an arrange¬ 
ment independent of student heights. Accordingly, some 
ordinal selection, say the tenth and twentieth name on each 
page, would give a random sample for the purpose of studying 
student heights. 

The method of ordinal selection must be usecl with care. If a 
reporter of a college newspaper visits every tentli room in a given 
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dormitory and if the dormitory is so arranged that each entry has 
only 10 rooms, it might happen that he would visit only a first- 
floor room in each entry. If he were seeking data of an economic 
or social character, the preferred location of the rooms visited 
might give a definite bias to the sample obtained. This would 
be a case in which the method of selection failed to be independent 
of the attributes of the members of the population in which the 
reporter was interested. 

In the use of ordinal selection, care must also be taken to 
note whether the list from which selection is made comprises the 
whole population being studied or is merely assumed to be repre¬ 
sentative of that population. The list of telephone subscribes 
in the city of New York might be taken, for example, to be 
representative of the whole adult population of the city. For 
some purposes, however, this would be a risky assumption. For 
very poor families cannot afford telephones, and any study of 
the social characteristics of the population would be biased unless 
it included representatives of this poorer section. The Lnerary 
Digest poll in 1936 failed accurately to predict the outcome of 
the presidential election because it relied heavily upon lists of 
names that were not representative of the whole population with 
respect to attributes affecting their votes. 

Mechanical Randomizing Devices . Random samples are often 
sought by the use of various “randomizing” devices. Suppose, 
for example, that each member of a finite population is identified 
by a number. If this number is written on a small piece of paper 
and inserted in a small metal cylinder and if these cylinders are 
put in a revolving drum and thoroughly mixed, the selection of a 
sample of cylinders from the drum might possibly be taken to be 
a random sample from the given population. This type of 
randomizing device is used in many lotteries. 

The shuffling of cards is another randomizing device. If the 
population is small enough, the numbers representing its various 
members may be written on cards and these may then be thor¬ 
oughly mixed by shuffling. The withdrawal of a given number 
of cards from the shuffled pack may be taken as a random sample 
from the given population. Such a sample may be taken without 
replacing any of the cards; or the number of each card may be 
recorded, the card replaced, and the pack reshuffled before the 
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next card is drawn. Both these would constitute random 
samples, but the probability of a given type of sample would be 
different under the two procedures. 1 

Similar devices make use of dice, roulette wheels, and other 
gambling instruments. Although a random sample may possi¬ 
bly be procured by such means, hidden bias in the mechanical 
device or in the procedure must be carefully avoided. If lottery 
' c yli n ders are withdrawn by hand, the person making the with¬ 
drawals may have an unconscious predilection toward the 
cylinders of a certain shape, smoothness, or texture. If numbers 
are written on cards with ink, the difference in the size of the 
number might affect the stickiness of the cards and cause some 
to be withdrawn less often than others. The use of dice notori¬ 
ously results in bias; Karl Pearson’s study of the gaming results 
at Monte Carlo indicates that the odds against the absence of 
bias are extremely large.* If the possibility of deliberate falsi¬ 
fication is precluded, the bias*in the Monte Carlo results “would 
appear to arise from small imperfections in the roulette wheel 
which direct the ball into some compartments in preference to 
others. 3 For these reasons, mechanical randomizing devices 
of this kind are no longer in high favor. 

Table^of Numbers . When it is possible to associate a number 
with each member of the population, tables of numbers such as 
tables of squares and other powers, tables of logarithms, various 
statistical tables, and even tables of telephone numbers are some¬ 
times employed in the attempt to secure a random sample. For 
example, if a population consists of 100 members and a number 
from 0 to 99 is associated with each member of the population, a 
statistical table involving seven-place figures, say, might be 
opened to any arbitrarily selected page and the last two digits of 
each of the first 10 lines of the table read off. The members of 
the population whose numbers correspond to the 10 pairs of two 
digits obtained in this way (09, 01, 07, etc., being read 9, 1, 7, 
etc.) might be considered to be a random sample of 10 from the 
given population. 

1 For further discussion of these two cases, see Chap. IX, pp. 186-190, 

209—211. 1 

2 Chances of Death, Vol. I, Edward Arnold, London and New York (1897) 
pp. 42-62. 

3 Kendall and Babington Smith, op. ciL, p. 156. 
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The success of this method of obtaining a random sample 
depends on the digits in the table occurring in a random order. 
Unfortunately in many tables of numbers, such as tables of 
* logarithms, a relationship exists between the figures in successive 
rows. Such tables, of course, cannot be used to draw a random 
sample. Even tables that are apparently random in character 
must be used with cautiou. Kendall and Babington Smith give 
the following account of an experiment with telephone numbers: 1 
“We have attempted to construct a random series by selecting 
digits from the London Telephone Directory. In order To 
exclude bias as far as possible, pages were taken by opening 
the book haphazardly; numbers of less than four digits were 
ignored; numbers associated with names printed in heavy type 
were also ignored; and only the two right-hand digits were taken. 

“It'was found that a series of this kind was significantly biased. 
There appeared a deficiency of fives and nines. ... 

“The reasons for this effect are complicated and are not confined 
to the obvious one that telephone engineers would avoid fives 
and nines because of their assonance. It thus appears that the 
London Directory is useless as a source of random digits.” 

Random Sampling Numbers . Because ordinary tables of num¬ 
bers may fail to yield random sets of digits, attempts have been 
made to construct special tables of random sampling numbers 
that may be used with some confidence. One such table is 
Tippett’s Random Sampling Numbers. This consists of digits 
picked at random from British census reports. The digits are 
arranged in groups of four so as to provide 26 pages of 400 four- 
figured numbers, or a total of 10,400 four-figured numbers. 
These figures have been subjected to a number of different tests 
of randomness. Generally the results were deemed satisfactoiy, 
although one investigator found that the figures were somewhat 
“patchy” in meeting the tests. 2 The table and a description 
of the various methods of using it have been published by the 
Department of Applied Statistics, University of London, Uni¬ 
versity College, as Tract XV of its Tracts for Computers. 

Another set of 5,000 random sampling numbers has been 
published in the Journal of the Royal Statistical Society by 

i Ibid., pp. 156-157. 

a See Yum:, G. Udny, “ A Test of Tippett’s Random Sampling Numbers, 
Journal of the Royal Statistical Society , Vol. 101 (1938), pp. 167-172. 
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M. G. Kendall and B. Babington Smith. 1 These were obtained 
from a special randomizing machine, which its inventors describe 
as follows: 2 “Essentially the machine consists of a disk divided 
into 10 equal sections, on which the digits 0 to 9 are inscribed. 
The disk rotates rapidly at a speed which can, if necessary, be 
made constant to a high degree of approximation by means of a 
tuning fork. The experiment is conducted in a dark room, and 
the disk is illuminated from time to time by an electric spark or 
by a flash of a neon lamp, which is of such short duration that the 
disk appears to be at rest. At each flash a number is chosen 
from the apparently stationary disk by means of a pointer fixed 
in space. 

“In the actual experiment, the disk was rotated by an electric 
motor at about 250 revolutions per minute. It was illuminated 
by a neon lamp in parallel with a condenser in an independent 
electric circuit which was broken by means of a key. Owing to 
experimental conditions, the time between the making of the 
circuit and the passing of the flash varied, but to add an extra 
element of randomness the key was tapped irregularly by the 
experimenter. Flashes 6ceurred, on the average, about once in 
3 or 4 seconds.” 

These random sampling numbers of Kendall and Babington 
Smith have also been subjected to various tests of randomness 
with satisfactory results. 

To draw a random sample, tables of random sampling num¬ 
bers are used according to the procedure already explained in 
connection with other tables of numbers. After the members 
of the population are assigned numbers, a sample of size N is 
picked by selecting any set of N numbers from the table of 
random sampling numbers. Suppose, for example, that a 
sample of 100 from a population of 10,000 commercial banks 
doing business in a given area is desired. The banks are, it 
happens, listed in a directory. The first bank in the directory 
can thus conveniently be numbered 0 and the last 9999; and 
then Tippett's tables can be opened to any page and the numbers 
m the first column read off. The banks whose numbers corre¬ 
spond to the numbers so selected will then constitute a random 
sample of 100 from the whole population of 10,000 banks. 

1 Op. cit., pp. 164-166, 

% Ibid,, pp. 157. 




A PREVIEW OF SAMPLING THEORY 


161 


Instead of taking the numbers from the columns of the table, 
they could have been selected by rows or even by diagonals. In 
each case, a random sample will be obtained. On the assumption 
that Tippett’s numbers are a random set, the selection of any 
subset of these numbers in a way that is independent of the 
numbers themselves will yield a random sample. Kendall’s 
and Babington Smith’s numbers can be used in the same way. 

Natural Selection . The methods of sampling described above 
are of use primarily in sampling from finite existent populations. 
For the most part they cannot be used to obtain random samples 
from an unknown hypothetical infinite population. 1 A popula¬ 
tion that would be hypothetically generated by the continuous 
operation of some physical, biological, or social process is an 
unknown hypothetical infinite population. When these proc¬ 
esses occur in everyday life, any existent results of that process 
may be tentatively taken as a random sample of the infinite 
population of results. The selection here is a “natural” one in 
that it is effected by the ordinary forces of everyday life. 

The role of the investigator in the case of hypothetical infinite 
populations is to study the circumstances under which the natural 
selection of the sample was effected in order to see whether some 
unusual influences might have given a special bias to this par¬ 
ticular sample and consequently might Have destroyed its natu¬ 
rally random character. Thus any set of white birth statistics 
might tentatively be taken as a random sample of the population 
of white births. If, however, the sex ratio were being investi¬ 
gated and it were known that geographical location had a sig¬ 
nificant; effect on the ratio of male to female births, then data on 
white births in a given area could be looked upon as a random 
sample only of white births in that area and not of births in all 
areas. 

Naturally occurring data thus provide their own method of 
selection. This is true whether the process giving rise to them 
is one of everyday life or a process carried out in some experi¬ 
mental laboratory or research station. In testing the effects of a 
certain serum, care may be taken to see that the guinea pigs 

1 When the population is known, however, and sampling is undertaken 
primarily for experimental purposes, random sampling numbers may be 
used to obtain random samples from infinite populations. See Yule, 
G. Udny, and M. G. Kendall, op. cit ., pp. 343-344. 
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used for the experiment are not noticeably abnormal, but 
otherwise the peculiar characteristics of each pig are those that 
chance happens to provide. Again, in agricultural experiments 
with fertilizer, the peculiar influences that affect each plot are 
those that are provided by the chance variations in soil, wind, 
rain, sunshine, and other natural forces. 

In experimental research, however, care must be taken to 
design an experiment so that the balance of natural forces is not 
all to the one side or to the other for certain parts of the data. 
Thus, if the difference in rate of growth of self-fertilized and cross- 
fertilized plants is being studied, and if the cross-fertilized plants 
are all placed on the sunny side of the experimental plot, the 
effect of the sun and any possible effect of the difference in 
fertilization will be so confounded that it will be difficult if not 
impossible to determine whether any significant difference in the 
rate of growth is due to the one or to the other. If an equal 
number of plants of both types are assigned at random to the 
two sides of the plot, however, then the constant effects of the sun 
will be eliminated. With this arrangement or design of the 
experiment, the possibility that the recorded difference in rate 
of growth may be due to chance instead of to a difference in 
method of fertilization may be determined by probability theory. 1 
It is the role of the experimenter in these cases so to design his 
experiment that natural forces do provide him with a random 
sample. 

In the case of very large finite populations, natural selection 
is also sometimes used to get a random sample. An organization 
investigating public opinion, for example, may send an agent 
to a given locality where the first dozen people he meets, of the 
sort whose opinion is desired, may be taken as a random sample 
of the given class of people. The sample is thus picked for him 
by chance forces. For the 1940 United States census, schedules 
were printed so that those persons whose names happened to 
fall on certain lines of each schedule were asked supplementary 
questions. Two out of the forty lines on each side of a schedule 
were marked off for this purpose so that a 5 per cent sample was 
obtained on the supplementary questions. 2 

1 Cf. Fisher, K. A., The Design of Experiments , Chap. III. 

2 See Stephan, F. F., W. E. Deming, and M. H. Hansen, “The Sampling 
Procedure of the 1940 Population Census,” Journal of the American Statisti¬ 
cal Association, Vol. 35 (1940), pp. 615-630. 
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SAMPLING AND THE THEORY OF STATISTICAL INFERENCE 

Tf a sample is drawn from an unknown population in a random 
manner, certain inferences about the population may be macle 
on the basis of probability theory. The following sections will 
discuss in a general way the testing of particular hypotheses 
regarding a population and the estimation of population param¬ 
eters. The application of this general theory to special problems 
will be discussed in detail in subsequent chapters. 

Testing Hypotheses Regarding the Population. Often a 
problem presents itself in such a way that a definite hypothesis 
regarding the population is offered for testing. A fruit merchant 
about to buy a carload of oranges wishes them to be of good 
quality; he does not want to buy the carload, let us say, if more 
than 10 per cent of the oranges are substandard. Accordingly, 
he will want to test the hypothesis that the carload is 10 per cent 
substandard. If this hypothesis is rejected, because a random 
sample suggests that the number of substandard oianges is more 
than 10 per cent, he will not buy the carload. But if the hypo¬ 
thesis is not rejected, because the random sample suggests that 
the number of substandard oranges is not greater than 10 per 
cent, he will buy the oranges. 

A similar problem is illustrated by the manufacturer of electric- 
light bulbs who does not wish to adopt a proposed new process 
of production unless the mean length of life of the bulbs produci¬ 
ble by it is 1,000 kilowatt-hours or more. He knows from past 
experience that electric-light bulbs vary in length of life in 
accordance with the normal frequency curve. He will thus want 
to test the hypothesis that the population of bulbs producible 
by the new process is a normal population with a mean, of 1,000 
kilowatt-hours. If this hypothesis is rejected, because a random 
sample suggests that the mean length of life is less than 1,000 
kilowatt-hours, the manufacturer will not adopt the new pioccss. 
But if the hypothesis is not rejected, because the random sample 
suggests that the mean length of life is not less than 1,000 kilo¬ 
watt-hours, he will adopt the new process. These are examples 
in which the problem itself suggests a particular hypothesis to 
be tested and alternatives to it. 

Errors in Testing Hypotheses . In testing a particular hypo¬ 
thesis regarding a population, two kinds of errors may be made. 
The first of these errors, hereafter referred to as “error I,” is the 
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rejection of the hypothesis when it is actually true. That is. 
the hypothesis will be deemed unreasonable on the basis of the 
sample obtained, although in fact the population is exactly as 
t e hypothesis assumes. The second kind of error, which will 
be referred to as “error II,” is the failure to reject the hypothesis, 
i-e., the failure to consider it unreasonable on the basis of the 
given sample, when in fact the population is not the same as the 
hypothesis assumes. The aim of the statistical testing of 
hypotheses is twofold: it seeks on the one hand to limit the risk 

° + i. the n firS i t kmd ° f 6rr0r t0 a P reassi g ne d amount and, on the 
other hand, to minimize the risk of the second kind of error 

. f r ° cedure M Limiting Risk of Error I. To limit the risk of 
la se!y rejecting a given hypothesis to a preassigned amount, the 
i olio wing procedure is adopted: 

First, the hypothesis to be tested is assumed to be true 
Second, a certain statistic is selected such as a mean or 
standard deviation, and probability theory is employed to show 
how this statistic might be expected to vary from sample to 
sample on the assumption that the hypothesis is true The 
process consists in supposing that a large number of samples of 
given size have been drawn at random from the assumed popula¬ 
tion and then finding the frequency distribution of the sample 
values of the selected statistic. This distribution of sample 
values is called the “sampling distribution” of the statistic, and 
e standard deviation of this distribution is called the “ standard 
error of the statistic. The argument here is purely theoretical 
and is based upon the probability calculus. 

The third step is to specify the degree of risk that the investi¬ 
gator is willing to take in rejecting the given hypothesis when it 

18 Th l S degre . e ° f risk is ° alled the “coefficient of risk.” 

1 e ioui . ste P 1S to study the sampling distribution of the 
selected statistic and to mark off a range of values for which 
e probabihty of the statistic falling within it is just equal to the 
coefficient of risk. This range is called the “region of rejection ” 
and the remaining range of values is called the “region of accept- 

a T- 1° SUppose that the ad opted coefficient of 

risk is .05. Furthermore, suppose that the statistic used to 

eSt K t u e i? Ven ky P° tllesis ls the mean of the sample and that the 
probabihty calculus indicates that means of random samples 
from the assumed population will be distributed in the form of a 
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normal frequency distribution, the mean of which is 1,000 and 
the standard deviation of which is 20. 1 • From the properties of 
a normal distribution, it is known that .05 of the frequencies of a 
normal distribution lie below the mean minus 1.645?. Hence the 
range of values below 1,000 - 1.645(20) = 967.1 could be taken 
as a .05 region of rejection. The range of values from 967.1 
upward would constitute the corresponding region of acceptance 

(ggg Fifr 43). 

The final step is to note whether the value of the selected 
statistic for the given sample falls in the region of rejection or the 



Fig. 43.—An .05 limit in the lower tail of a normal sampling distribution. 

region of acceptance. For example, if the sampling distribution 
and the regions of rejection and acceptance were as indicated 
above, and if the mean of the given sample were 965.2 the given 
hypothesis would be rejected because this sample value would 
fall in the range below 967.1. If, on the other hand, the sample 
mean had been 972.3, then the hypothesis would have been 
accepted, for the sample value would then fall in the range 

above 967.1. . . . , 

This procedure ensures that the risk of rejecting a given ypo- 
thesis when it is true will be just .05. For if it is always followed 
in testing hypotheses, false rejections will be made only 5 per 
cent of the time, since sample values will fall m regions of rejec¬ 
tion only that often when hypotheses are true. 

It will be noted that there are three arbitrary elements m the 
procedure. One is the selection of the statistic to be used, the 

1 As indicated in Chap. VI, the standard deviation of the sampling dis¬ 
tribution of the mean, i.e., the standard error of the mean, is equal to the 
standard deviation of the population divided by the square loot o the s 
of the sample. For samples of 25, say, a standard error of 20 implies a 
population standard deviation of 100. 
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second is the selection of the coefficient of risk, and the third 
IS the selection of the regions of rejection and acceptance. The 
choice of the statistic will depend in part on the ease of calcula¬ 
tion and in part on the size and position of the regions of rejection 
and acceptance to which it gives rise. The latter consideration 
is of prime importance since, as pointed out below, the size and 
position of the regions of rejection are a major factor in reducing 
the error of accepting a hypothesis when it is not true. This 
will be more fully discussed later. 

. ^ lc • Sl '' ect ' 0]1 °f the coefficient of risk will depend in most 
instances upon the nature of the problem. If action based upon 
nonacceptance of a hypothesis is not of great significance, the 
investigator may be willing to run considerable risk of falsely 
rejecting a hypothesis. In testing a certain manufacturing 
process, for example, it may be that any tendency for the process 
■ to deteriorate may be corrected with relatively little expense. 
The manufacturer may in such instances be willing to undergo 
an occasional overhauling of his process when in fact there is no 
real need for such overhauling. In other cases, however, The 
expense of overhauling may be very great, and the manufacturer 
may be willing to undergo an unnecessary overhauling only very 
infrequently and may gladly undertake considerable expense for 
statistical testing to avoid any such unnecessary overhauling. 

In the former case, the manufacturer might be willing to 
adopt a coefficient of risk of .10 (a risk that 10 per cent of the 
overhauhngs will be unnecessary); in the latter, even a coefficient 
of risk of .01 (a risk that 1 per cent of the overhauling will be 
unnecessary) might be too great. In cases where serious or 
dangerous medical treatment is involved, a still smaller coefficient 
of risk might be adopted. A coefficient of .05 is one'"that has 
come to be frequently used in cases where there is no strong 
reason for adopting a very high or a very low figure. 

The third arbitrary element, the choice of the regions of rejec¬ 
tion and acceptance, is illustrated in Fig. 44. For any given 
statistic and coefficient of risk it is possible to pick innumerable 
regions of rejection and acceptance. In the previous illustration 
the range of values below 967.1 was taken as the region of rejec¬ 
tion, and the range of values from 967.1 upward was taken as the 
region of acceptance. The range of values above 1,000 + 1.645<r 
that is, above 1,032.9, could equally well have been used as a 
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.05 region of rejection and the range of values from 1,032.9 down¬ 
ward as the corresponding region of acceptance, for the prob¬ 
ability of a sample mean falling above the mean, plus l.645o-, is 
likewise .05. This probability of .05 could also have been 





obtained by splitting the region of rejection so that it included the 
range of values below 1,000 - 1.96er, that is, below 960.8, and 
the range of values above 1,000 + 1.96 <t, that is, above 1,039.2. 
The corresponding region of acceptance in this case would have 
been the range of values from 960.8 to 1,039.2, as shown in the 
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third section of Fig. 44. 1 Still other regions of rejection could 
have been found that would have had a probability of .05. 

Any of these regions would have limited the risk of falsely 
rejecting the given hypothesis to the preassigned coefficient of 
risk, viz., .05. They would not all have been equally good with 
respect to the risk of the second kind of error, viz., that of accept¬ 
ing the given hypothesis when it is not true. It is with the 
problem of selecting regions of rejection and acceptance that 
will minimize this second kind of error that the ensuing dis¬ 
cussion is concerned. 

Procedure for Minimizing Risk of Error II. If the population 
is not as specified by a given hypothesis, the probability of a 
sample falling in any region of acceptance selected for limiting 
the risk of error I will depend on the position of the region in 
relation to the actual character of the population. If there 
were one particular region of acceptance for which the probability 
of a sample falling within it was less than that of any other 
region giving the same risk of error I, no matter how the popula¬ 
tion differed from the given hypothesis, this would obviously 
be the best region that could be selected. For, in this case, the 
probability of accepting the false hypothesis would be less than 
it would be for any other region giving the same risk of error I. 

Such a best critical region is usually impossible to determine. 
Ordinarily, one region will be the best that can be chosen if the 
population differs from the given hypothesis in one direction, 
while another region will be the best if the population differs 
from it in another direction. If the statistician is concerned with 
the possibility of the population differing from the given hypo¬ 
thesis in whatever direction possible, some compromise region 
is generally employed. 

The selection of good regions of rejection and acceptance will 
be illustrated by reference again to the three diagrams of Fig. 44. 
Suppose that the hypothesis to be tested is that the mean of the 
population from which a sample has been drawn equals 1,000, 
and suppose that probability theory, together with certain facts 
known about the population, indicates that the sampling dis¬ 
tribution of means of samples from this assumed population will 
be a normal distribution with a mean of 1,000 and a standard 

1 This is the only type of region that was discussed in J. G. Smith and 
A. J. Duncan, Elementary Statistics and Applications , Chap. XII. 
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deviation of 20. Finally, suppose that the three regions of 
rejection and acceptance shown in Fig. 44 are considered for 
adoption. The question is which of these regions will be the 
best to employ. The answer is that it depends on the nature 
of the problem; this can be illustrated by concrete cases. 

buppose. that the data of the given example pertain to the 
length of life of electric-light bulbs, measured in kilowatt-hours 
producible by some new process. The manufacturer of these 
bu bs, it can be argued, will desire especially to avoid acceptance 
ol the hypothesis that the average length of life of the new bulbs 
is 1,000 kilowatt-hours when in fact it is less than that. For if 
he accepts the given hypothesis when actually the mean length 
ot life is greater than 1,000 kilowatt-hours, he stands to lose 
not mg; he m fact gets a better process than he expected. But 

. he accepts the hypothesis of a mean length of life of 1,000 
kilowatt-hours when actually the mean is less than that, he gets 
a poorer process than he expected. The quality of his product 
will not reach the desired standard, and the result may be a 
considerable loss of money. In this instance, therefore the 
manufacturer will want to adopt regions of rejection and accept¬ 
ance that will minimize the risk of accepting the hypothesis 

i non aC ^! alIy the mean of life of the bulbs is less than 

■1,000. this set of regions of acceptance and rejection is shown 
m the first diagram of Fig. 44. 

By the selection of this set of regions of rejection and accept¬ 
ance, the risk of error II is minimized; this is illustrated by three 
lagramsmn Fig. 45. The three diagrams in this figure show the 
effects of applying the three sets of regions of rejection and 
acceptance shown in Fig. 44. The three curves in Fig. 45 show 
the true probabilities of obtaining sample means of varying 
sizes, forming a normal distribution about the true population 
mean ol 980, superimposed over the scale showing the regions 
oi rejection and acceptance according to the hypothesis of a 
mean of TOOO, shown in Fig. 44. The diagrams in Fig. 44, from 
which the regions of acceptance were obtained, show the prob¬ 
abilities of obtaining sample means of varying sizes, forming a 

normal distribution with a mean of 1,000 and a standard devia- 
tion of 20, 

nf^n 6 t 5 SU ^ geStS that the risk 0f acce Ptog the hypothesis 
or 1,000 when the actual mean is less than 1,000 is less for the 
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region of acceptance running from 967.1 upward than for any 
other region^that involves the same risk of rejecting the given 
hypothesis when it is true (i.e., for any other .05 region). In 
each of the three diagrams the proportion of the area under the 
curve crosshatched indicates the probability of accepting the 




hypothesis that the mean is 1,000, when in fact it is 980, that is, 
the probability of accepting a hypothesis when it is not true. 
Part I of the figure shows the probability of a sample mean 
falling above 967.1, part II shows the probability of its falling 
below 1,032.9, and part III the probability of its falling between 
960.8 and 1,039.2. These probabilities are the risks of accepting 
the hypothesis of a mean of 1,000 when the true mean is 980 
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and when the region of acceptance for testing the given hypoth¬ 
esis runs from (I) 967.1 upward, (IT) 1,032.9 downward, and 
(III) from 960.8 to 1,039.2. 

It is clear that this risk is least for the first of these three 
regions, as shown by the fact that the crosshatched portion of 
the area under the curve is less in the first diagram of dig. 45. 
Further study of this kind shows that, whenever the mean of the 
population is less than 1,000, region I will always give a lower 
risk of accepting the hypothesis of 1,000 than any other region 



Fto 46.—Rampling distribution of the mean of a sample when the mean of 
the population is 1,000 and the standard deviation of the population is 100. 
Keeion of rejection, .025 points'at each end, when A - 25 compared to yhen 


N = 100. 


of acceptance for which the risk of falsely rejecting the hypoth¬ 
esis is .05. For the problem illustrated, therefore, it is the best 
region of acceptance that may be employed; since the manu¬ 
facturer wants to minimize the risk of accepting the hypothesis 
that the mean is 1,000 when in fact it is less. 1 In other instances, 
one of the other regions might be preferred. 

Effect of the Size of the Sample. In testing hypotheses the 
size of the sample is of prime importance; for, the larger the 
sample, the narrower the sampling distribution, i.e., the smaller 
the standard error, of the statistic used to test a hypothesis. 
The standard error of the mean, for example, is equal to the 
standard deviation of the population, divided by the square root 
of the size of the sample. Hence the standard error of the mean 
varies inversely with the square root of the size of the sample." 

i Of course, if the risk of rejecting the hypothesis were increased, the risk 
of falsely accepting the hypothesis could be reduced. 

^ See Chaps. XIII, XIV. 
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From this inverse relationship between the spread of the 
sampling distribution and the size of the sample it follows that, 
the larger the sample, the closer the finite limits of the region 
of acceptance to the central tendency of the sampling distribu¬ 
tion of that statistic, assuming, of course, a constant coefficient 
of risk. This increases the likelihood of rejecting a hypothesis 
that is not true. In other words, the larger the sample, the 
closer the value of a sample statistic must be to its expected aver¬ 
age value if a given hypothesis is to be accepted. This is illus¬ 
trated by Figs, 46 and 47, which contrast the regions of rejection 
and acceptance with samples of 25 and 100, respectively. 



Still more important is the effect of the relationship between 
the size of the sample and the standard error of the sample 
statistic upon the risk of accepting a hypothesis that is not true. 
Since the sampling distribution becomes narrower as the size 
of the sample increases, the larger the sample, the less the 
probability that a given hypothesis will be accepted when it is 
not true. This is illustrated in Figs. 48a and 486. It is prin¬ 
cipally for this reason that greater confidence is put in a test 
making use of a large sample than in a test employing a small 
sample. In Figs. 48a and 486, the probabilities of samples 
occurring in the regions of acceptance are represented by the 
crosshatched portions of the sampling distributions about the 
true mean in each case. 

Effect of the Statistic Selected. As pointed out above, the 
procedure for testing a given hypothesis may be varied by taking 
different sample statistics. A hypothesis regarding the mean 
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of a normal population, for example, may be tested by using 
either the mean or the median of the sample. The former has 
the advantage, however, of having a sampling distribution that 



Fig. 48a.--Standard deviation of the population known to be 100; if the 
hypothesis is that the mean of the population is 1,000, the range of values down¬ 
ward from 967.1 will be a .05 region of rejection for testing this hypothesis with 
reference to a sample of 25. The range of values upward from 967.1 will be 
the corresponding region of acceptance. This figure shows the probability of a 
sample falling in this region of acceptance when the true mean is 980 and not 
1,000. 



C T 


Fig. 485..-Standard deviation of the population known to be 100; if the 

hypothesis is that the mean of the population is 1,000, the range of values down¬ 
ward from 983.5 will be a .05 region of rejection for testing this hypothesis with 
reference to a sample of 100. The range of values upward from 983.o will be 
the corresponding region of acceptance. This figure shows the probability of a 
sample falling in this region of acceptance when the true mean is 980 and not 
1 , 000 . 


is narrower than that of the median. In other words, the 
standard error of the mean is less than the standard error of 
the median. 1 The use of the mean in preference to the median, 
therefore, is the equivalent of employing a larger sample; or, 
to put it another way, as good a test can be made with the mean 

1 This result may be reversed for other than normal universes. 
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of a smaller sample as with the median of a larger sample. The 
former is accordingly the better statistic to employ for the test. 

Estimation of Population Parameters. Often the statistician 
is interested in more than the testing of a single hypothesis. 
In some cases, no particular hypothesis is suggested by the 
problem. Even if it is, the statistician may wish to know, not 
only whether a particular hypothesis is acceptable on the basis 
of the sample return, but also what hypotheses in general are 
acceptable and what are not. More exactly, he may wish to 
draw a boundary line for which it can be said that the chances 
are, say, 95 out of 100, that this boundary includes the true 
population. In addition, he may want to select a single hypoth¬ 
esis as the best hypothesis to be adopted. The theory of estima¬ 
tion seeks an answer to these questions. 

Specification of a Population. A population is precisely speci¬ 
fied when its functional form is given together with the values 
of its parameters. A normal population, for example, is pre¬ 
cisely specified when the fact of its normality is given, together 
with the values of its mean and standard deviation. In many 
cases, however, a population will be reasonably well determined 
if its more important parameters are given, such as its mean, 
standard deviation, (3i> and (J 2 . For in this case a Pearsonian 
curve or a Gram-Charlier curve can be derived from which the 
probabilities of the population can be approximately computed. 
Thus, if the form of a population is known, estimates of its 
parameters will exactly specify it; if the form is not known, esti¬ 
mates of its parameters will at least approximately determine 
the population. Hence, estimates of a population become largely 
estimates of its parameters. 1 

In the ensuing discussion it is assumed that the form of the 
population is known a priori. This will make for greater 
simplicity and precision in the analysis and will facilitate the 
presentation of the argument. It will not affect any of the 
general conclusions of this chapter. 

Confidence Intervals. A principal consideration in the theory 
of estimation is to determine confidence intervals for a popula¬ 
tion parameter. A confidence interval for a given parameter is 

1 In many instances, knowledge of a particular parameter or set of param¬ 
eters is all that is desired, so that full knowledge of the population is not 
required. 
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a range of values that, on the basis of a given sample, has a 
specified probability of including the true value. The probability 
associated with any confidence interval is called the “confidence 
coefficient” for the interval. 

The procedure by which such a confidence interval for a given 
parameter may be obtained will now be traced step by step: 

1. Assume at the start that the form, of the population is 
known a priori. For simplicity, also assume that the population 



Fig. 49.—A given sample in a region of acceptance. 



Fig. 50.—A given sample in a region of rejection. 


is characterized by the value of a single parameter, which may 

be designated as 0. - 

2. Let some sample statistic 8 be chosen for the purpose ot 
estimating the value of 0. What statistic is chosen is immaterial 
to the present argument. 

3. From knowledge of the population, derive by use ot the 
probability calculus the sampling distribution of a for samples 
of the given size. This will give the relative frequencies with 
which sample s’s may be expected to have different values, 
since the sampling distribution of s is derived from the original 
population, the precise nature of the distribution will depend 
on the value of the parameter 0. The way in which a sampling 
distribution might vary with variations in a parameter 0 is 
suggested in Figs. 49 to 52. 
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AAA oonfidence coefficient for the given interval be set 
at .95. this means that the interval to be determined should 
have a probability of .95 of covering the true value of e. 

5. Calculate the value of s for the sample actually obtained. 
Then determine the value of the parameter 6 that will cause the 
upper .025 point of the sampling distribution of s (as noted in 
item 3, this distribution varies with the value assigned to 6) to 
coincide with the computed sample value of s. Such a value of 




0 is pictured in Fig. 51. Call this value of e “6 lower ” and 
designate it by g. Next select a value of 8 that will cause the lower 
■U25 point of the sampling distribution of fJ’s to coincide with the 
computed sample value of *. A value of 6 that will give this 
result is pictured in Fig. 52. Call this “0 upper,” and designate 

•r a 7 he mterval s ' e 18 the desired confidence interval. For 
if the foregoing procedure is always followed in setting up con¬ 
fidence intervals, the true value of 8 will fail to be covered only 
m those instances in which the sample value of s falls in the upper 
or ower .025 tail of the true sampling distribution of *; and this 
will occur on the average only 5 times out of 100. Hence on 
the average, the confidence interval 8-6 will cover the true 
vaffie of 8 95 times out of 100. The values 8 and 6 are called 
tne lower and “upper confidence limits” of 0 
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In conclusion, it may be noted that there are three arbitrary 
elements in the procedure for the determination of confidence 
intervals, just as there were three arbitrary elements in the 
procedure for testing a hypothesis. One is the choice of the 
sample statistic s to be used; the second is the selection of the con¬ 
fidence coefficient; and the third is the selection of the points 
of the sampling distribution of s that are to coincide with the 
sample value. 

Since the sampling distributions of some statistics are narrower 
than others, for the same confidence coefficient a smaller con- 
?hdence interval can be obtained in some cases than in others. 1 
Some statistics thus provide more accurate estimates of popula¬ 
tion values than others. 

Confidence intervals can be made smaller, 2 if the confidence 
coefficient is smaller. Thus, in a given instance one may be 
able to say that the chances are 80 out of 100 that the interval 
50-70 includes the true value; or he might be able to say that the 
chances are 95 out of 100 that the interval 30-90 includes the 
true value. If the investigator is willing to run greater chances 
of being wrong, he may thus reduce the size of the interval that 
is said to include the true value. In matters involving life and 
death a very high confidence coefficient should be adopted. In 
testing to discover the effect of an advertising campaign, on the 
other hand, a much smaller coefficient might be used. 

The third arbitrary element in the determination of confidence 
intervals is the selection of the points of the sampling distribution 
of s to serve as the points of coincidence with the sample value 
of s. In the analysis above, the upper and lower .025 points 
were chosen. For a confidence coefficient of .95 any other set of 
points that included 95 per cent of the sample values of s could 
have been used, such, for example, as the lower .01 point and 
the upper .04 point, or the lower .03 point and the upper .02 
point. If an upper or lower limit only is desired for a confidence 
interval, the lower or upper .05 point of the distribution of s can 
be used. The particular points that are chosen in any case will 
depend on the problem in hand. Normally in estimating 0 

1 Or if, as noted on pp. 177-178, a confidence interval has only one finite 
bound, this finite bound will be closer to the sample value. 

2 Or the finite? bound of an interval that has one infinite bound can be 
brought closer to the sample value. 
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both an upper and lower bound will be desired. In some 
instances, however, the statistician may not care if he overesti¬ 
mates a certain figure, but he may wish especially not to under¬ 
estimate it. In other instances, he may wish particularly to 
avoid overestimation. 

The whole process of determining confidence intervals may be 
illustrated by reference to the previously used electric-light bulb 
problem. Suppose that a sample batch of 25 bulbs has a mean 
length of life of 965 kilowatt-hours, and suppose that the manu¬ 
facturer desires to determine limits for the true mean length of 
life that have a probability of .95 of covering this true value. 
To simplify the analysis, suppose that the standard deviation in 
length of life of the bulbs is known to be 100 kilowatt-hours and 
that only the mean is unknown. 1 

To determine confidence limits for the mean of the population 
given a sample mean of 965 kilowatt-hours, the procedure is as 
follows: First identify the mean of the population as the unknown 
parameter 0, and let the statistic s, which will be used to estimate 
0, be the mean of the sample. Next note that the sampling dis¬ 
tribution of the means of random samples from a normal popula¬ 
tion can be shown by the probability calculus to be a normal 
distribution with a mean equal to the mean of the population 
and a standard deviation equal to the standard deviation of the 
population divided by the square root of JV, the size of the sample. 
In the problem in hand, therefore, the standard deviation of the 
sampling distribution of s wall have a standard deviation of 
100/V25 = 20 and a mean of which the value will be unknown 
but which will equal 0. 

As the third step in the analysis, note that the value of the 
mean of the population that will cause the upper .025 point of 
the sampling distribution of s to coincide with the given sample 
value of s is 925.8. For the upper .025 point of the sampling 
distribution of the mean lies at just 1.96 times 6 above the mean 
of the population, i.e ., at 1.96 times 20 = 39.2 above the mean of 
the population. If this is to coincide with the given sample 
mean of 965, then the population mean must have the value 
965 — 39.2 — 925.8. The value 925.8 will thus be taken as 
the lower confidence limit for the mean of the population. 

1 Knowledge of the standard deviation might be obtained from technical 
considerations. 
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Similarly, note that the value of the mean of the population that 
will cause the lower .025 point of the sampling distribution of 
sample means to coincide with 965 is 965 + 39.2 = 1,004.2. 
The value 1,004.2 may thus bo taken as the upper confidence 
limit for the mean of the population. The whole confidence 
interval will thus be 925.8 - 1,004.2, and it may be said that the 
chances are 95 out of 100 that this interval includes the true 
population mean. 

Tf the statistician wished to make an especially conservative 
report to the manufacturer about the new process, he might 
choose a confidence interval that has only an upper bound (but 
still a confidence coefficient of .95). This upper bound would 
be determined by finding the value of the mean of the population 
that caused the given sample value to fall at the lower .05 point 
of the sampling distribution of the mean. For the problem 
illustrated, this upper bound would be given by 965 + 1.645 
times 20 = 997.9 (for the .05 point of a normal distribution 
comes at 1.645 times *r from the mean). The statistician would 
in this instance report to the manufacturer that there is a chance 
oi 95 out of 100 that the range of values up to 997.9 includes the 
true value, or, to put it another way, that there is only a chance 
of 5 out of 100 that the range of values above 997.9 includes the 
true value. 

Effect of Size of Sample. For confidence intervals making use 
oi the same statistic, the same confidence coefficient, and the 
same type of interval, the larger the sample the smaller the con¬ 
fidence interval, or, in the case of an interval having only an 
upper or lower bound, the closer the bound to the sample value 
ihat is, a larger sample is always a better means of estimating 
the character of a population than a smaller sample. Of course 
large samples may be more costly to procure than small ones’ 
and their greater accuracy may not be worth the additional 
expense. In all cases, however, improvement in estimates to be 
obtained from larger samples should be given consideration, for 
the gam from increased size may far overbalance the higher cost 
Maximum-likelihood Estimates of Population Parameters 
sometimes a problem requires something further than setting 
up a range of values that probably includes the true value of a 
population parameter. It may be desirable to have a single 
figure that can be considered the “best,” or “optimum,” estimate 
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that can be made of the population parameter from a given 

sample. . . ; 

It should be noted at the start that various methods may be 
employed to estimate population parameters, and it is doubtful 
if any single method is the best for all circumstances. There 
is one method, however, the method of maximum likelihood, that 
has been received with considerable favor, and it is this method 

that will be described here. ... 

The method of maximum likelihood reasons entirely from the 
sample to the population. It consists in estimating the values 
of the population parameters so that the probability of the given 
sample among all possible samples of the same size is a maximum. 
It will be noted that, if the values of the population parameters 
are given, a particular set of sample values will have a definite 
probability. As the parameter values are changed, the prob¬ 
ability of the given sample values also changes. According to 
the method of maximum likelihood, the “best estimates” that 
can be made of unknown population parameters, given a par¬ 
ticular sample, are those values of the parameters which, if they 
were the true values, would make the probability of the given 


sample a maximum. 

Mathematically the method of maximum likelihood may be 
described as follows: Suppose that for given values of the popu¬ 
lation parameters 6 , 0', 0", . . . , the probability of a sample 
consisting of X h X 2) . . . , X. is given by the probability dis¬ 
tribution F(X h X h ... , X., 0, 0', 6", • • • )• Then for given 
values of X u X,, ... , X., that is, for a given sample, the 

best estimates that can be made of 0,0',0", . . . are those values 
that maximize the logarithms of the function F. The logarithm 
of F instead of F itself, is maximized because the mathematical 
analysis is easier. The results are the same, however, for F is a 
maximum when log F is a maximum. If the form of F As known 
these maximizing values can be found by the use of the differential 
calculus. Thus the optimum estimates of 0, 0', and 0 , . . . 
must satisfv the conditions 


d log F _ 9 log F = Q 3jogP = o 

~w~ - ° 30' 30” 


These give as many equations as there are parameters, and theii 
common solution affords the desired estimates. 
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It will be noted that the values of the population parameters 
that are found by the method of maximum likelihood are values 
that depend on the given sample values. In other words, the 
estimate of each population parameter is expressed as a certain 
“function” of the sample values. These functions of the sample 
values constitute “statistics,” for that is what a statistic is, a 
function of the observed sample values. Usually it is found 
that the maximum-likelihood estimate of a given population 
parameter is the sample counterpart of the parameter itself. 
If the population is normal, for example, the mean and stand¬ 
ard deviation of the sample are the maximum-likelihood esti- 
mates, respectively, of the mean and standard deviation of the 
population. 

Sometimes it is desired to estimate a population parameter 
in a way that will be independent of the estimates of other 
parameters. Such an independent estimate of a population 
parameter may be made under either of two conditions. If it 
turns out that in making the maximum-likelihood estimates one of 
the partial derivatives gives an equation containing only a single 
parameter, then this parameter may be estimated immediately 
without troubling with the estimates of the other parameters. 

The second condition permitting an independent estimate of a 
population parameter may be described as follows: Sometimes 
it happens that the function F giving the probability of a given 
sample can be broken up into the product of two factors, one of 
which depends on the sample values and on a single population 
parameter. If this is true, the value of that parameter can be 
estimated independently of the other parameters by choosing 
the value that maximizes this part of the total probability. For 
example, it so happens that the probability of a sample from a 
normal population can be factored into a part that depends only 
on the sample values and on the standard deviation of the popu¬ 
lation. The other part depends on the sample values and on 
both the mean and the standard deviation of the population. 
By taking as an estimate of d the value that maximizes the first 
factor there will be obtained a maximum-likelihood estimate of 
the population standard deviation that is independent of the 
population mean. When the mathematics of this is actually 
carried out, 1 it is found that this independent estimate of the 
i See pp. 290-294. 
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population standard deviation is 


\ N - 1 

In other words, the maximum-likelihood estimate of the standard 
deviation of a normal population that is independent of the 
population mea n is giv en by the standard deviation of the sample 

multiplied by __ ^ When the mean and standard deviation 

are estimated jointly, the maximum-likelihood estimate of the 
standard deviation of the population is the standard deviation 

of the sample. The factor that occurs in the inde¬ 

pendent estimate may thus be viewed as a correction factor that 
makes allowance for neglect of the estimate of the population 
mean. 

Maximum-likelihood Statistics as “Optimum” Statistics. 

Statistics derived by the method of maximum likelihood are 
deemed to be “optimum statistics” in certain senses. 1 First 
these statistics are said to be “consistent.” By this it is meant 
that as the size of the sample is increased the sample statistic 
approaches closer and closer to the population parameter it is 
used to estimate. A more precise statement is that the sampling 
distribution of the given statistic becomes more and more con¬ 
centrated around the value of the population parameter and the 
probability of any given finite deviation from the population 
value becomes less and less. Such an approach of the sample 
value to the population value as the size of the sample is increased 
is spoken of as “stochastic convergence.” A maximum likeli¬ 
hood estimate is thus a consistent statistic in that it approaches 
the population value stochastically. 

Maximum-likelihood estimates are also optimum statistics 
in that they are “efficient.” By this it is meant that in the 
limit as the size of the sample is indefinitely increased there is no 
other statistic that has a smaller sampling variance. For a 
finite sample some other statistic used to estimate the same 

1 See Fisher, R. A., “On the Mathematical Foundations of Theoretical 
Statistics, ” Philosophical Transactions of the Royal Society of London, Series 
A, Vol. 222 (1922), pp. 309-368. 
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population parameter may have a somewhat smaller sampling 
variance, but as the size of the sample is increased the sampling 
variance of the maximum-likelihood statistic will tend to become 
at least as small as that of the other statistic. Maximum-likeli¬ 
hood statistics are thus said to have sampling variances that are 
a minimum asymptotically. 

A third optimum characteristic of maximum-likelihood esti¬ 
mates is their “sufficiency.” This characteristic ensures that, 
when a maximum-likelihood estimate has been made of a popu¬ 
lation parameter, no further information about that parameter 
can be obtained from any statistic that is independent of (i.e., 
not functionally related to) the maximum-likelihood statistic. 
Again this characteristic is obtained onty in the limit as the size 
of the sample is indefinitely increased. 

Maximum-likelihood statistics are thus optimum statistics 
in three ways. They are consistent, efficient, and sufficient. 

STRATIFIED, OR REPRESENTATIVE, RANDOM SAMPLING 

Often sampling fluctuations can be greatly reduced by use 
of partial or supplementary knowledge about a population instead 
of retying entirety upon the sampling process itself. Suppose, 
for example, that it is known that with respect to the division 
of public opinion on a given question religious preference was 
an important influencing factor. And suppose further that 
census data are available giving the number of people in the 
given community belonging to each religious category. If 
reliance was placed upon random sampling only to give a repre¬ 
sentative sample of public opinion, then it would be hoped that 
the sample was representative regarding the proportion of various 
religious adherents in each category as well as of the division 
of public opinion -within each. When knowledge of the relative 
number of people in each religious category is available, however, 
then there is no need to rely upon sampling for it. In this case, 
the sampling can be so devised that the number of people sampled 
in each group is proportional to the number of people in each 
group in the whole population. The randomness of the sampling 
will in this instance be restricted to the selection of the individuals 
within each religious category. This is known as “stratified” 
or “representative” random sampling. It is especially impor- 
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tant in public-opinion analysis, index-number construction, 1 and 
the like. 

The significance of stratified, or representative, random sam¬ 
pling is that it reduces sampling errors. It makes use of knowl¬ 
edge of correlation between the variable which is being studied 
and one or more other variables which are correlated with this 
variable and about which information is available. By using 
this correlation it diminishes the extent of the chance fluctuations. 2 

PURPOSIVE SAMPLING 

Some sampling methods do not employ the random technique 
but select their samples to conform to chosen criteria. This is 
spoken of as “ purposive sampling.” For example, suppose a 
given research bureau is in search of some particular knowledge 
regarding the urban population of the United States. To get 
this they may go over carefully all the cities in the country 
and select a city that is “typical,” say, as to size, as to proportion 
of heavy and light industries in the city, as to percentage of 
foreign born, etc. This city then will be taken as a typical sample 
of American cities, and the data it reveals on the problem in ques¬ 
tion will be considered as typical of American urban population. 

Purposive sampling is also used in combining census data in 
various ways, in order to save the time and money that would 
be required by use of the complete data. In the Danish census 
of 1923, for example, it was decided to pick a representative 
sample that was to be about 20 per cent of the whole country 
and also 20 per cent of the total in each of the country’s 22 
counties. 3 The procedure for securing this sample was as 
follows: 

For each of the 1,300 parishes in the country, the number of 
cows per 100 hectares of farm area was computed and the parishes 
in each county were grouped according to the magnitude of this 
ratio, five groups being distinguished in each county. From 
each of these groups the parish was selected whose agricultural 

1 See Smith and Duncan, op. cit, Chap. XIX. 

2 It will be recalled that the variation around a line of regression, i.e., 
the scatter, or second-order variance, is always less than the total variation 
in the dependent variable. 

3 See Jensen, Adolph, “The Representative Method in Practice,” 
Bulletin de VInstitut international de statUtique , tome 22 (1926), P re livraison, 
pp. 420-421. 
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area most nearly amounted to one-fifth the total agricultural 
area of the group. In some cases, two or more parishes were 
combined or a single parish divided in order to get this 20 per cent 
sample. Next the parishes so selected were drawn on a map to 
see if the various parts of the country were about equally well 
represented. When this was not the case, certain parishes with 
approximately the same number of cows per hectare as parishes 
included in the sample were substituted for the latter so as to 
improve the geographical distribution. Finally this preliminary 
sample was tested with regard to special factors such as the num¬ 
ber of farms, total agricultural area, grain area, number of milch 
cows, number of pigs, etc., to see whether the sample represented 
with respect to these factors approximately one-fifth the whole 
country. Where there were marked divergencies in this respect, 
adjustments were made to make the sample approximately 20 
per cent of the whole for these auxiliary items without at the 
same time disturbing the area relationship or the geographical 
representation. 

Purposive sampling of this kind is hard to appraise. The 
argument is that if a sample is representative in certain respects 
it will be representative in other respects, but this need not be a 
necessary consequence. Samples of this kind depend largely 
on the judgment of those making them and are worth just about 
as much. 1 

1 Cf. remarks on p. 154. 



CHAPTER IX 


SAMPLING FROM A DISCRETE TWOFOLD POPULATION 

Sampling from a discrete population in which all the elements 
fall into one or the other of two categories is one of the simplest 
sampling problems. This type of problem is illustrated when 
public opinion is being investigated, when a manufacturer is 
testing the quality of a given production process, or when a 
sociologist is determining the ratio of male to female births. 

Sampling from a discrete twofold population affords an oppor¬ 
tunity to study in a more detailed yet relatively simple way 
most of the aspects of sampling theory sketched in the previous 
chapter. The following discussion consequently offers an easy 
approach to the general principles of sampling, and the reader is 
urged to master it completely. The arguments, for the most 
part, will follow the essential steps that were outlined in the 
preceding chapter. Chapter XIII generalizes the argument for 
a manifold population. 

THE PROBLEM AND INITIAL ASSUMPTION 

To avoid excessive abstraction the argument will be expounded 
with reference to a concrete example. Suppose that a manu¬ 
facturer has produced an order of machine parts numbering 
several hundred thousand pieces. According to the standards 
of the trade an order of this kind is not acceptable if more than 
10 per cent of the parts are defective. Routine inspection of 
each part is costly, however, and the only feasible method of 
testing the lot is to take a sample. The problem then arises 
as to what inferences can be made about the percentage of defec¬ 
tives in the whole lot from the determination of the percentage 
of defectives in the sample. 

If the sample yields 11 per cent defectives, is it a reasonable 
hypothesis that the percentage of defectives in the whole lot 
is only 10 per cent? What are the limits within which the 
percentage of defectives in the whole lot may be reasonably con¬ 
sidered to lie? What is the best estimate that can be made of 
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the percentage of defectives in the whole lot? It is with these 
questions that the following analysis is concerned. 

Assumptions. In selecting the size of the sample to be tested 
the manufacturer is pulled in opposing directions. The larger 
the sample tested, the greater the expense to the manufacturer. 
He will seek, therefore, to test as small a sample as is practicable. 
On the other hand, in choosing a small sample he increases the 
risk of having the whole lot rejected when in fact it may be 
above standard. 1 

In the ensuing argument it will be assumed that the sample 
selected is small relative to the size of the population, although 
it may be large absolutely. More precisely, it will be assumed 
that the withdrawal of the sample parts does not materially 
change the percentage of defective and nondefective parts in 
the whole lot. Suppose, for example, that the initial lot con¬ 
sists of 500,000 parts and that a sample of 2,500 is taken. If 
10 per cent of the whole lot were defective and 40 per cent of the 
sample were defective (an extremely unlikely result), the per¬ 
centage of defectives in the whole lot after the entire sample had 
been taken would still be 9.85, which differs but little from 10 
per cent. In what follows it will be assumed that, as the sample 
is drawn, the percentage of defectives in the whole lot remains 
unchanged. This will greatly simplify the analysis. 2 

It will further be assumed that the process of sampling is a 
random one, as described in the previous chapter. The full 
implication of this will be understood only as the analysis pro¬ 
ceeds. For the moment it may be noted that with randomness 
the results of repeated sampling should be predictable by mathe¬ 
matical probabilities calculated from some appropriate mathe¬ 
matical model. 

DISTRIBUTION OF SAMPLE PERCENTAGES 

In the problem in hand the population of parts is divided into 
groups, defective parts and nondefective parts; it is therefore a 
twofold population. The problem is to make inferences regard¬ 
ing the percentage of defective parts in the population from a 
sample set of data. The only sample statistic that is appropriate 

1 See pp. 171-172. 

2 For a discussion of the complications that arise when this assumption 
cannot be made, see pp. 209-211. 
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for making inferences about the population percentage is the 
percentage of defective parts in the sample. For subsequent 
analysis, therefore, it will be necessary to have the distribution 
of sample percentages from a twofold population, i.e., a formula 
giving the probabilities of sample percentages taking on various 
values in repeated random sampling from a given twofold popu¬ 
lation. The derivation of this sampling distribution is the next 
task. 

Derivation of the Sampling Distribution. To derive a sam¬ 
pling distribution it is necessary to find a mathematical model 
that symbolizes the conditions of sampling and to use this model 
for the calculation of the desired probabilities. In drawing up 
the model, therefore, the first step is to note carefully the con¬ 
ditions of sampling. In the present problem these may be 
described as follows: 

Conditions of Sampling. Prior to the withdrawal of any 
sample let the percentage of defective parts in the whole popula¬ 
tion be pi, and let the percentage of nondefective parts be p 2 . 
After the withdrawal of the first member of a sample, the per¬ 
centages of defective and nondefective parts remaining in the 
population will not, if the population is finite, be exactly pi and 
p 2 * It is one of the fundamental assumptions of the problem, 
however, that the changes in pi and p 2 will be so slight that they 
may be ignored. Hence the percentages of defective and non¬ 
defective parts in the population when the second member of 
the population is to be drawn will still be practically pi and p 2 . 
The same reasoning may be applied to subsequent withdrawals, 
so that on each draw the percentages of defective and nondefec¬ 
tive parts in the population will be for jail practical purposes 
pi and p 2 . The withdrawal of a sample of N cases from the 
given population without replacement will therefore be considered 
to give practically the same results as withdrawals with replace¬ 
ment. The following analysis will assume, then, that a sample 
item is replaced before the next item is drawn. 

The immediate problem is a study of what will happen under 
repeated sampling. If the sampling process is random, each 
member of the population will have an equal chance of being 
drawn on every occasion. This means that as a large number of 
samples are drawn, with replacements, every member of a popu¬ 
lation will presumably be drawn sometime or other with every 
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member of every other population, and in the long run no par¬ 
ticular combination of members will appear any more frequently 

than any other combination. 1 

The Mathematical Model These conditions of sampling sug¬ 
gest that the sampling distribution of percentages may be 
derived from the following model: Suppose there are ten packs 
of cards in each of which pi per cent are black cards and p 2 per 
cent are white cards. All possible combinations of N cards each 
are formed by combining without restriction each card of each 
pack with every card of every other pack. Since the probability 
(relative frequency) of a black card in each of the 10 packs from 
which the various cards are selected is pi and the probability 
of a white card is p 2 , and since the combination of cards is without 
restriction so that the selection of a black or white card from 
any pack does not affect the probabilities in other packs, then, 
according to the multiplication theorem, the probability of a 
combination in which, for example, the first 4 cards are black 
and the last 6 are white is p}pj. But the same would be true 
for any combination having 4 black cards and 6 white cards. 

Since there are = \°.’ = 210 ways ° f picking 10 

cards so that 4 of them will be black, there will be 210 such 
combinations. The probability of a combination having 40 per 
cent black cards and 60 per cent white will therefore be 210 
pjp®. In general, for N packs, the probability of a combination 

having ^ per cent black cards and LML per cent white cards 
will be 

The model: C? l p?‘P 2 , ~ ,, ‘ = i 

If this mathematical model is correct, the equation just given 
will yield the probabilities of samples having ■— per cent defec¬ 
tive parts and --Af 1 nondefective parts in the whole set of 

1 It should be carefully noted here that the members are identified, as 
individuals so that a particular combination means a particular combination 
of members. Thus a combination of three members m which the first two 
were defective and the last nondcfcctive would not be the same combination 

in which the first was defective and the last two nondefective. 
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samples of N each that might be drawn at random (with replace¬ 
ments) from a population containing px per cent defective and p 2 
per cent nondefective parts. For, as noted above, if the process 
ot sampling is random and each sample item is replaced before 
the next is drawn, it is reasonable to suppose that every particular 
combination of parts will appear as frequently as every other 
particular combination of parts so that the probability of samples 

having ^ per cent defective and nondefective parts 

will tend to be the same as the probability of combinations of 

cards having — per cent black and ^ per cent white cards 

among the set of all possible combinations of cards that might be 
made of one card from each of N different packs, each pack con¬ 
taining px per cent black and p 2 per cent white cards. Although 
this model is derived for sampling with replacements, it also gives 
approximate results, as noted above, for sampling without 
replacements. 1 

If the percentage of defective parts in the sample Y j s chosen 

as the sample statistic, the sampling distribution of this statistic 
will therefore be 


(N A IV! 

\ n ) = W W-AW pf,pr “"' (!) 


This will be recognized as the general equation for a binomial 
distribution. 2 Hence it follows that the sampling distribution 
of a perce ntage has 3 a mean equal to p,, a standard deviation 

equal t0 a 5i = > and a (? 2 = 3 + 1 ~ 6p ' p2 . 

^ V ^PlP2 iVpip2 

Factors Affecting the Distribution. Effect of Size of the Sample. 
The foregoing equations show that the character of the sampling 


1 See p. 188. 

2 See p. 43. 


Cf. p. 45. Since the attribute is taken as ^ instead of Ah, as in 
the previous discussion of the binomial distribution, the mean of ^ is 
= pi and the standard deviation of i s since 

ft. ond fi ^ - rv •_.7 . , . _ 


iv * v n —-v. 

g J and are coefficients their values are unaffected by this change in units. 






SAMPLING FROM A DISCRETE TWOFOLD POPULA TION 191 


distribution of a sample percentage from a twofold population 
is dependent on the size of the sample N. With a small value of 
N, the distribution of probabilities may be very skewed, espe¬ 
cially if pi is markedly different from p 2 , and the standard devia¬ 
tion will be large. With a large value of N, the distribution will 
be more symmetrical ((5i approaches 0 as N increases), and the 
standard deviation will be small. If large samples are taken, 
therefore, the probabilities of samples in which the percentage 
of a selected attribute deviates slightly from the population per¬ 
centages will be more or less symmetrically distributed about 
the population percentage as a central value and the probabilities 
of samples in which the percentage deviates by a large amount 
from the population percentage will be very small. That is, 
the larger a sample, the smaller the probability of its differing 
greatly from the population. 

A further consequence of increasing the size of the sample N 
is that the distribution of probabilities tends to approach the 
normal curve whose mean and standard deviation is the same 
as that of the binomial distribution. 1 Hence, in large samples, 
the probabilities of various types of samples can be computed 
from a normal probability table. 2 This is an especially important 
conclusion for the practical application of the mathematical 
model to the problem. 

Effect of the Population Parameter p 3 . The sampling distribu¬ 
tion described by Eq. (1) gives the probabilities of various types 
of samples on the assumption that the percentage of defective 
parts in the whole population is pi. As the value of pi is varied, 
the character of the sampling distribution will be changed. For 
the subsequent discussion it is important to note the nature of 
these changes. 

For samples of size 10 (this small size is taken temporarily for 
purposes of exposition), Fig. 53 and Table 23 show how the dis- 


1 Cf. pp. 46-47. 

2 Actually the ordinates of the binomial distribution are approached by 
those of the normal curve (see Appendix to Chap. IV, pp. 6S--74). If, 
however, the probability of a given value of Ni/N is represented, not by the 
height of a line or bar, but by the area of a rectangle whose base is d and 


whose heigh t is P 



d, then as N is increased (and hence d, which equals 


Vpipa /N) is decreased) the rectangles become thinner and thinner and then- 
area tends to be approximated by that of the standard normal curve, 
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tribution of a sample percentage changes as the value of pi is 
varied from 0 to 1 by intervals of .05. It will be noted that from 
p! = 0 to pi — .5 the sampling distribution changes from a 
positively skewed distribution to a symmetrical one, and then 
from pi — .5 to pi = 1 it changes to a negatively skewed dis¬ 
tribution. The reader should here center his attention on the 
character of the rows of Table 23 or on the left-right variation 



Fig. 53.—The variation in the character of the distribution of a sample percentage 
with changes in the population parameter pi. 


of Fig. 53 as he proceeds from back to front. It will also be noted 
that for a given sample percentage Ni/N the probability of the 
sample rises as the population percentage pi approaches the 
point pi = Ni/N. That is, the probability of a given sample 
result is a maximum when pi = Ni/N. Here the reader should 
center his attention on the variation from row to row for a given 
column of Table 23 or from back to front for a given value of 
Ni/N in Fig. 53. 

If the population is very large, variation in pi can be assumed 
to be practically continuous, that is, pi can be given any hypo¬ 
thetical value between 0 and 1. On the other hand, for samples 
of size 10, orly 11 values of Ni/N are possible; variation in this 
quantity is therefore discrete. Accordingly, Fig. 53 consists 
essentially of a series of curves running from back to front and 
separated from left to right by intervals of .1. If sampling is 
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made with a much larger sample, say a sample of 2,500, the 
variation m JVi/iV may also be considered to be practically con¬ 
tinuous. In that instance, Fig. 53 can be replaced by the smooth 
sur ace o ig. 54. As pointed out in the previous section, 
except lor the extreme ends where p x is very small, the left to 
right sections of this surface are practically normal curves. 
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TESTING HYPOTHESES 




Up to this point the argument has been purely deductive 
Uven a certain population, it has been asked, how frequently 
will random sampling produce certain types of samples? The 
crucial part of the argument is the reverse of this: Given a certain 
sample what inferences can be made about the population 
from which it was obtained? It is this part of the argument that 
will now be examined. 

The Hypothesis. With reference to the original illustration, 
suppose that the manufacturer of machine parts is particularly 
interested m the hypothesis that the whole set or population of 
parts has just the standard 10 per cent of defectives. He takes a 
sample of 2,500 parts and finds that 11 per cent of the sample are 

defective. How does his hypothesis fare in the light of this 
sample result? 
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Coefficient of Risk. The first step is to decide upon the coeffi¬ 
cient of risk. The manufacturer recognizes, of course, that 
even if the number of defective parts in the whole population is 
just 1.0 per cent he might by chance get a sample m which the 
percentage of defective parts is much greater or less than 10 per 
cent. There is always some risk that he will reject the 10 per 
cent hypothesis when it is actually true. The initial step, 
therefore, is to decide upon how great a risk of this kind he is 
willing to run. Suppose he decides that on the average he does 



not want to reject the hypothesis when it is true more than 5 
times out of 100. He will then adopt a coefficient of risk of .05. 

A .05 Region of Rejection. From the analysis of the previous 
section, the manufacturer knows that if he takes a sample of 
2,500 from a large population the probabilities of the various 
possible types of samples will be given approximately by a normal 
probability curve whose mean is at the po pulation percentage pi 
and whose standard deviation is Vpip 2 /A r . From this he knows 
that if the number of defectives in the whole population is 
actually 10 per cent (pi - .10), then the probability of obtaining a 
sample in which the percentage of defectives is equal to or less 
than ,10 - 1.96d is .025 (see normal table in Appendix, Table VI). 

He also knows that the probability of obtaining a sample in 
which the percentage of defectives is equal to or greater than 
.10 + 1.96d is likewise .025. Hence, according to the addition 
theorem, the probability of a sample in which the percentage 
of defectives lies outside of the limits .10 + 1.96<J and .10 — 1.96d 
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is .025 + .025' = .65. Since d = = . 00 6, the actual 

values of these limits are 8.82 per cent and 11.18 per cent (see 
Fig. 55).. It follows, therefore, that the risk of rejecting the 
hypothesis when it is true will be just .05 if the manufacturer 
rejects it whenever a sample contains defective parts numbering 
less than 8.82 per cent or more than 11.18 per cent. For if the 
number of defectives in the whole population is exactly 10 per 
cent, samples will fall in these extreme regions (below 8.82 per 



Fio. 56. An alternative region of rejection for testing the hypothesis p, = .10 
Coefficient of risk = .05. 

cent and above 11.18 per cent) only 5 per cent of the time. 
Hence, if the manufacturer follows this rule he will, on the 
average, reject the hypothesis when it is true only 5 times out 
of 100, which is the degree of risk he is willing to undergo 
Alternative Regions ' of Rejection . It will be noted, however 
that the suggested region of rejection is not the only one that the 
manufacturer might adopt. He can follow the rule of rejecting 
the hypothesis whenever the sample percentage falls above 
. 0 + 1.645d, that is, above 10.99 per cent, and his risk of 
rejecting the hypothesis when it is true will still be only .05. 
For as will be noted from a table of the normal curve, the 
probability of a deviation from the mean of 1.645d or more is 
just .05. The region beyond 10.99 per cent is therefore an 
alternative region of rejection with a coefficient of risk of 05 
This is illustrated in Fig. 56. Still a third region with a coeffi¬ 
cient of risk of .05 is the region lying below .10 - 1.645d, that is, 
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below 9,11 per cent. For, owing to the symmetry of the normal 
curve, the probability of a negative deviation from the mean 
of 1.645d or more is also .05. This third alternative legion of 
rejection is illustrated in Fig. 57. Yet a fourth region with a 
coefficient of risk of .05 is the region lying below .10 - 2.327d 



and above .10 + 1.751d, that is, below 8.60 per cent and above 
11.05 per cent. This is illustrated in Fig. 58. 

In fact, there are an infinite number of regions of rejection 
that the manufacturer might adopt, all of which have associated 
with them a coefficient of risk of .05. What is the criterion that 
he should follow in choosing from among these various possible 
regions? 
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The Best Region of Rejection. Each region of rejection has a 
corresponding region of acceptance. As suggested in the previ¬ 
ous chapter, the “best” region of acceptance would be the one 
for which the risk of accepting a given hypothesis when it is not 
true is a minimum. Since the probability of a sample falling 
in the region of acceptance is equal to 1 minus the probability 
of its falling in the region of rejection, the former will be a mini¬ 



mum when the latter is a maximum. The best region of rejec¬ 
tion, therefore, will be a region for which the probability of 
rejection when the given hypothesis is not true is a maximum. 1 

How the various regions suggested in the previous paragraph 
fare in respect to this criterion is indicated by the probabilities 
shown in Table 24. Table 24 gives data for each of the four- 
alternative regions, showing how the probability of a sample 
falling in the region varies with the population percentage pi. 

1 In technical language this probability of rejection when the hypothesis 
is not true is called the “power” of the test associated with the selected 
region. The criterion is to select the region with the highest power. The 
region so selected will then give the “most powerful” test. 
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The method of computing the data in Table 24 is illustrated in 
Fig. 59, and the results are presented in Fig. 60, which is based 

upon Table 24. . ^ 

It is obvious from this figure that there is no one region foi 
which the probability of a sample is the greatest for all possib e 
values of the population percentage different from the one act 
up by hypothesis. The region above .1099 (the region of Fig. 
Sfi -1 has' the highest probability for population percentages 



greater than the hypothetical percentage being tested (that is, 
Pl = 10 in this case), and the region below .0911 (the region of 
Fig. 57) has the highest probability for population percentages 
less than the hypothetical percentage. The region below .0882 
and above .1118 (the region of Fig. 55) does well for values ol pi 
both greater and less than the hypothesis being tested, but it 
does not have the maximum probability for any values. There 
is not in this instance any single region of rejection that is best 

for all problems. . . , 

In order to determine the region of rejection that is best lor the 
present problem the manufacturer must draw upon other con¬ 
siderations. Suppose he wishes particularly to avoid unneces¬ 
sary and expensive refinements in production methods and 
accordingly aims to keep the percentage of defectives as c ose 
as’ possible to the accepted standard of 10 per cent defective 
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parts. At the same time, suppose he does not care if orders are 
sent out that are below standard; it may be that the purchaser 
will carry out his own test to avoid buying a substandard set. 
Accordingly, the manufacturer will select the region of Fig. 57, 

Table 24.—Variations in the Probability of a Sample Faliinp ™ 
Given R egions fob Different V au ks oe the PophiItion p!? 


Values of pi 


.091 

.092 

.093 

.094 

.095 

.096 

.097 

.098 

.099 

.100 

.101 

.102 

.103 

.104 

.105 

.106 

.107 

.108 

109 


Probability of a sample 


Region of Fig. 55 


.3157 

.2581 

.2067 

.1623 

.1272 

.0971 

.0758 

.0619 

.0521 

.0500 

.0597 

.0653 

.0824 

.1069 

.1389 

.1782 

.2218 

.2718 

3304 


Region of Fig. 56 Region of Fig. 57 Region of Fig. 


.0005 

.0010 

.0018 

.0033 

.0055 

.0091 

.0154 

.0227 

.0344 

.0500 

.0708 

.0968 

.1292 

.1685 

.2033 

.2643 

.3228 

.3821 

.4443 


.4404 

.3745 

.3121 

.2546 

.2033 

.1587 

.1270 

.0934 

.0694 

.0500 

.0359 

.0250 

.0174 

.0116 

.0078 

.0049 

.0032 

.0020 

.0013 
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.1952 

.1522 

.1174 

.0876 

.0671 

.0524 

.0435 

.0401 

.0464 

.0501 

.0636 

.0834 

.1102 

.1439 

.1851 

.2333 

.2846 

.3448 

.4053 
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i.e., below .0911, as the best region for his purpose. He will 
thus accept the hypothesis that the number of defectives in the 
populatmn !s 10 per cent least often when that percentage is 
actually below 10 per cent. 

‘The assumption is that if a sample shows a percentage of defectives 
that is considered too low, the manufacturer would discontinue some 
expensive refinement in the method of production that might lower the 
quality somewhat but would yield him more profit. 
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On the other hand, if the manufacturer desires especially to 
protect his reputation and wishes particularly to avoid sending 
out an order that is actually below standard, he will select the 
region of Fig. 56, i.e., above .1099, as the best region to use. In 
this instance, he will accept the hypothesis of 10 per cent least 
often when the population percentage is actually above 10 
per cent. He runs the least risk in this case of damaging his 
reputation. 1 

If the manufacturer wishes especially to avoid the danger of 
damaging his reputation by sending out substandard lots but 
nevertheless desires to give some consideration to the additional 
expense of attaining too high a standard, he might adopt a 
region such as that lying above .1105 and below .0860, that is, 
the region of Fig. 58. If he is equally indifferent or equally 
concerned about his reputation and the additional expense of 
attaining a high standard, he would do best to choose the region 
lying above .1118 and below .0882, the region of Fig. 55. For this 
latter region favors values neither above nor below the hypothesis 
being tested; it is an “unbiased” region. 2 

In the problem illustrated, suppose that the manufacturer 
adopts the unbiased region lying below .0882 and above .1118, 
the region of Fig. 55. He knows that if this region is always 
adopted in his sampling tests he will in the long run reject the 
10 per cent hypothesis when it is actually true only 5 times out 
of a 100. He also knows that the probability of accepting the 
10 per cent hypothesis when it is not true is evenly distributed 
with respect to actual percentages above and below 10 per cent. 
He will in this instance not cause undue damage to his reputa¬ 
tion nor induce himself to incur an unnecessary expense of 
production. 

1 That is, least risk for the size region adopted. If the manufacturer had 
adopted a coefficient of risk of .10 of rejecting the 10 per cent hypothesis 
when it was actually true, his regions of rejection would all have been larger 
and his regions of acceptance correspondingly smaller. There would be less 
risk of accepting the hypothesis in this case when it was not true. When 
it is said in the text that a certain region gives the least risk of accepting the 
hypothesis being tested when some other values are true, it is meant that 
this is the least risk for any region of the specified size, i.e., for any region 
for which the associated risk of rejection is the specified coefficient .05. 

2 Generally a region is said to be “unbiased” if the probability of a sample 
falling within it is less when the given hypothesis is true than when any 
alternative hypothesis is true. 
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' The Test of the Hypothesis . The actual sample result is 11 
per cent. Since this does not fall in the adopted region of rejec¬ 
tion, the hypothesis of p x = 10 per cent is not rejected and the 
given lot of parts is deemed to be of standard quality. 

USE OF THE DISTRIBUTION OF A SAMPLE PERCENTAGE TO 

ESTIMATE THE VALUE OF THE POPULATION PERCENTAGE 

The foregoing pages were primarily concerned with the rejec¬ 
tion or acceptance of a particular hypothesis regarding the 
percentage of defective parts in the population. It did not indi¬ 
cate the limits within which this percentage might reasonably 
lie, nor did it indicate what might be a good estimate of the 
actual percentage of defectives in the whole population. Often 
these considerations are of more importance than the testing 
of a particular hypothesis. 

Determining Confidence Intervals. The procedure for deter¬ 
mining a reasonable range for a population parameter, given a 
particular sample, is much the same as that followed in testing 
hypotheses regarding that parameter. First it is necessar y to 
decide upon the degree of risk that is to be run in failing to 
include the true value of the parameter within the estimated 
range. To put it positively, it is first necessary to determine 
the degree of confidence that may be had in the estimated range 
covering the true value. Furthermore, if no one type of range 
is found to be the best, it is necessary to decide whether failure 
to include the true value because the range is placed too low is 
of equal importance as failure to include the true value because 
the range is placed too high. These matters will now be con¬ 
sidered in some detail. 

Determining an Unbiased Interval . In the example of the 
preceding section the manufacturer of machine parts took a 
sample of 2,500 parts from a large lot and found that, of these 
2,500 parts, 275, or 11 per cent, were defective. On the basis of 
this sample, the manufacturer may determine an unbiased 1 con- 

1 The interval is called “unbiased” because it is so determined that the 
probability of the interval failing to cover the true value because it is too 
high is equal to the probability of the interval failing to cover the true value 
because it is too low. It happens in the present instance that this unbiased 
interval is also symmetrical about the sample value. In other cases, how¬ 
ever, it will be found that an interval that is unbiased in the probability 
sense is not symmetrical around the sample value (see pp. 287-289). 
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fidence interval as follows: The confidence coefficient associated 
with this interval will be put at .95. First let the manufacturer 
find the value of pi that will make the difference between it and 

the sample percentage, that is — pi> just equal to 1.96d. 


Since d = yjffi this value of pi will be given by the solution of 


, .. N, „ 

the equation — pi 

approximate equation 1 


1.96 




? 2 for pi, which leads to the 
N 


N 


Pi - ^ ” L96 


4 


'NNi - IV'!' 


( 2 ) 


A — \ iV = 

In the present instance this formula yields the result j?i = .098. 
It will be noted that this lower limit for pi is designated pi 
(called u pi lower”). The procedure is illustrated in Fig. 61. 

Next let the manufacturer find the value of pi that makes the 
difference between it and the sample percentage just equal, to 
-L96d. This second value will be given by the equation 

N_ 

N 


-L96d. Inis seconu vaiue ww uc 
b _ pi - —1.96 or by the approximate equation 2 


JVl , . q/> INN 1 


JVf 


N 


(3) 


For the given problem this yields pi = .122. Again it will be 
noted that this upper limit for Pi is designated by pi^called “p i 

upper”)- 


' VlUk? ****** --iT * ^ ~ 

For illustration the reader is again referred to Fig. 61. 


Ni 


1 The approximate formula is obtained by substituting ^ for pi and 


EjlN-i for P2 in the radical on the right, that is tf is estimated from the 

N . . : . 

sample percentages. The formula given by the solution of the equation is 

4. I N* 

■A Ni + 1.92 - 1.96 J. 96 +iVi 
t l = N~+SM 

but when N is large, as it should be when the normal curve is taken as an 
approximation to the binomial distribution, this more exact formula differs 
but little from the approximate one given 111 the text. 

2 The exact formula is the same as that given in the preceding footnote 
except that the square root is now preceded by a plus sign instead of a 
negative sign. 
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. These two values of Pn viz., 21 and p h mark off a confidence 
interval that the manufacturer may claim has a probability of 
.95 of covering the true value. This means that, if he con¬ 
tinuously follows the foregoing procedure in sampling from a 
twofold population, the interval he sets up will on the average 



cover the population value 95 times out of a 100. For, on the 
assumption that the normal curve is a good approximation to 
the binomial distribution, sample percentages will fall within the 
range ±1.96d of the population percentage 95 per cent of the 
time ; Hence, ranges of ±1.96d about the sjample percentage 
will include the population percentage 95 per cent of the time. 
It is only when the sample percentages deviate more than 
1.96d from the population value that a range of ±1.96d about 
the sample percentage will fail to include the population value, 
and according to the normal probability curve the probability 
of such an event is only .05. It will be noted that the manu¬ 
facturer speaks, not of the probability of the population value 
falling m the confidence interval, but of the probability of the 
interval covering the population value. For it is the interval 
that varies from sample to sample, not the population value; the 
latter is an unknown constant. 
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Other Confidence Intervals. The confidence interval just 
determined is not the only interval with a confidence coefficient 
of .95. Other intervals with the same confidence coefficient 
may also be derived. For example, it is possible for the manu¬ 
facturer to select an interval that has only a variable lower limit, 
the upper limit being the maximum attainable value of pi = 1-00. 
This lower limit may be computed by finding the value of pi that 
will make the probability of getting the given sample percentage 

^ or a greater value just .05. Such a value of the lower limit 

will be given by the equation ~ — pi - 1.645d or the equation 

NNi — JVy ^ or th e given case in which 
A 3 

__ .... - 275, this lower limit will be .100. The 

manufacturer can therefore say with equal validity that the 
interval .100-1.000 has a chance of 95 out of 100 of including 
the population value. 

Again, with the same confidence coefficient of .95, the manu¬ 
facturer may choose an interval that has only a variable upper 
bound, the lower bound being the minimum possible value of 0. 
This upper limit can be computed by finding the value of pi 
that will make the probability of the sample percentage of a 
lower percentage just equal to .05. The value will be given by 

the equation ~ — pi = — 1.645d or by the formula, 


£1 - T - L645 

Af — 9 p;nn and 




pi = 


AT 

N 


+ 1.645 



i - AT 
A 3 


For a sample of 2,500 and a sample percentage of 11, this upper 
limit will be .120. Hence, with as much truth as in the other 
cases, the manufacturer can say that the range 0-.120 has a 
chance of .95 of including the population value. 

Finally, the manufacturer may adopt an interval determined 
in the following manner: He may determine a lower limit such 
that, if it were the population value, the probability of getting 
the sample percentage of 11 per cent or a greater value will be 
just .04. And he may determine an upper limit such that if it 
were the population value the probability of getting the sample 
percentage or a lower value will be just .01. These two limits will 
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be given by the equations 

Ni 1 _ , Aft 

JV ~ Pi = 1751(i an d - pi = 2.327d 


or by the equations, pi = 


iV 


JL • I KJ A 


V 1 


Ah 

A' 


\ A^ 3 


j_ 9 Q97 Inn i - n f * ^ 

-h 2.32/ ^ - For a sample of 2,500 and a 


sample percentage of 11, these limits will be .099 and .125. 
Thus the manufacturer might say, as in the other cases, that the 
interval .099-. 125 has a chance of .95 of including the population 
figure. 

The Best Interval . There are accordingly an infinite number 
of different ranges, all associated with a confidence coefficient 
of .95, that might be adopted in setting up confidence limits for 
the population parameter Is there any one range that might 
be designated the “best”? Before answering this question, 
consider the various values covered by the different intervals! 

If the confidence interval given by the limits 


Ah 

N 


± 1.96 N \ 




is compared with the confidence interval given by the limits 
[NN 1 - iV| , ^ . 

\ JV 3 anc * ^ 1S f° un d that for an actual 


Ah 

N 


- 1.645 


population value of, say, 10 per cent the latter interval would 
include the value above 10 per cent, especially the higher values 
such as 13 per cent, 14 per cent, etc., much more frequently than 
the former would. For the latter always runs up to 100 per cent 
no matter what the sample percentage, while the former runs up 
as far as 13 per cent, say, only if the sample value is as high as 
11.68 per cent, 1 which, on the assumption that the population 
value is 10 per cent, is likely to be a rare phenomenon indeed. 2 


For .04 of the normal curve lies about 1.751d from the mean and .01 
lies below, — 2.327d from the mean. 


This is calculated from the exact formula —- n 

N y 

2 If pi = 10 per cent, then d = .006, per cent 


■ - 1-w 

— 10 per cent 


u = 2 - 80 ’’ 
and the probability of a normally distributed variate exceeding its mean by 
more than 2.80 times d is only .0026. 
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On the other hand, the second type of interval always has a 
larger value for its lower limit than the first, and it will therefore 
include values below 10 per cent (the actual population value) 
less frequently than the first type of interval. If the same type 
of comparison were made between the interval given by the 


limits “r 1 + 
N ~ 


1.96 


Inn i - 

\ JV 3 


N\ 


and the interval given by the 


limits 0 and — + 1.645 if would be found that the 

former would include values of pi below the actual value of 10 
per cent much less frequently than the latter, while the latter 
would include values above 10 per cent less frequently. 

In the problem illustrated it would seem that the manufacturer 
would prefer the interval running from 0 to 


Pi 


Ni 

N 


+ 1.645 


v 


NNi - N } 


N 3 


for this would put the best face on his product. It would include 
the true value 95 per cent of the time, just as the other intervals 
w r ould, and it would put the upper limit no higher than necessary. 
The buyer would be at no disadvantage if this method of stating 
the limits were employed. For if he understood their derivation 
he would know that 5 times out of 100 he might get a lot in which 
the percentage of defective parts exceeded the stated upper limit, 
and he would not buy unless he were willing to run this risk. 

Influence of the Size of the Sample on the Confidence Interval. 
It will be noted that in the foregoing analysis the size of an 
interval depended upon the value of 6 and this in turn depended 
on the value of N, the size of the sample. Since d = Vp^/W, 
it is seen that the larger the sample, the smaller the value of <S, 
and hence the narrower the confidence interval. That is, for a 
given confidence coefficient, say a probability of .95, the larger 
the sample, the closer the population value may be estimated. 

Influence of the Size of the Confidence Coefficient on the Con¬ 
fidence Interval. The size of the confidence interval is also 
dependent on the size of the confidence coefficient. For example, 
if in setting up an unbiased confidence interval the manufacturer 
had been willing to have the interval cover the population value 
only 90 times out of 100, he could have set up limits given by 
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~ ± 1.645d instead of ~ ± l.OGd. 1 Accordingly, by choosing 

a lower confidence coefficient, the manufacturer could have 
narrowed his confidence interval. The same is true for other 
types of confidence intervals. 

A Single Estimate of pi. The foregoing has had to do with the 
determination of a range of reasonable values for pi. In some 
instances, however, the manufacturer may desire a single esti¬ 
mate of the population percentage. He may wish, for example, 
to report to a buyer some measure of the quality of the lot being 
sold, and an estimate of the percentage of defective parts in the 
lot may be a suitable figure for this purpose. Estimation of 
population parameters by the method of maximum likelihoad 
was outlined in the previous chapter. Consider how it may be 
applied to the present instance. 

In estimating pi from the sample result, the manufacturer 
might proceed as follows: He might turn to Eig. 53 or to its 
more general form, Fig. 54, and pick out the point on the Ni/N 
axis representing his sample percentage. The chart would have 
to be one for which the size of the sample, N , was the same as 
that taken by the manufacturer. Then he could proceed in the 
direction of increasing pi values until he had reached the highest 
point on the surface, i.e., until he found the value of pi that made 
the probability of the given sample a maximum. If he had 
taken a sample of 10 items, for example, and 2 had been found 
defective, he would start at the point .2 on the Ab/A axis of Fig. 
53 and proceed perpendicularly to this axis until he had reached 
the value pi = .2. There he would find that the probability of 
his sample result would be a maximum. The maximum-likeli¬ 
hood estimate of pi would be .2, the same as the sample percentage. 
This, and all estimates made in this way, are called “maximum- 
likelihood estimates,” since they choose the value of the popula¬ 
tion parameter, in this case pi, so that the probability, or 
likelihood, of the given sample result, here Ni/N, is a maximum. 

In general, the maximum-likelihood estimate of a population 
percentage is the sample percentage. This may be demon¬ 
strated mathematically as follows: Equation (1) shows that if the 
percentage of attribute A in the population is pi, the probability 

1 For the probability of a normally distributed variate deviating from its 
mean by ± 1.645d or more is just .10. 
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of a sample of N having Ni A’s is 1 

N ! 

l(N - JVi)! 

According to the differential calculus, the value of pi that will 
maximize this probability for a given value of Ah and A 7 must 
satisfy the condition 2 


^ (A") A 7 1! /Ar Ar A ^ 11 ^ 




dp 

which reduces to 


J = A r 1 pf 1 “ 1 (l - pi) Ar ”' Vl - pHN - Ah) (1 - pi) Ar “ Ari “ 1 = 0 


iVi(l - pi) - Pi(AT - Ah) = 0 


or 


Pi = 


Ni 

N 


If the maximum-likelihood estimate of pi is written pi to dis¬ 
tinguish it from the population value itself, then pi - Ah/A 7 is 
the maximum-likelihood estimate of pj, that is, the sample 
percentage Ah/A 7 * is the maximum-likelihood estimate of the 
population percentage pi. 


SAMPLES FROM RELATIVELY SMALL POPULATIONS 

One of the fundamental assumptions of the foregoing analysis 
was that the population was very large relative to the sample 
taken. For it was on this basis that the probability of a given 
attribute in the population was assumed not to be materially 
affected by the withdrawal of various members of the sample. 
Tt was argued, for example, that, if a sample of 2,500 parts was 
taken from a lot of 500,000 parts in which the actual number of 
defective parts was 10 per cent, the percentage of defectives in 
the 499,000 parts left after 1,000 parts had been drawn would 
still be close to 10 per cent (actually, 9.82 per cent) even if all 
the 1,000 parts drawn turned out to be defective. Hence the 
analysis proceeded on the assumption that the percentage of 
defective parts in the population remained unaffected as the 
sample items were taken out. That is, the sampling discussed 
was sampling with replacements. It is the purpose of this section 

1 Here p 2 is replaced by its equal 1 — p 3 . 

2 Generally the logarithm of the probability is maximized, but here it is 
just as easy to differentiate the function itself. 
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to consider how the analysis must be modified when nonreplace¬ 
ment of the sample items is taken into account. 

When the population is of such a size relative to the sample 
that consideration must be given to the effects of the withdrawal 
of the sample items on the percentage of a given attribute in the 
population, then the distribution of a sample percentage can no 
longer be adequately described by the binomial distribution. 
Instead, as may be shown by an analysis similar to that of Chap. 
IV, the mathematical law describing the distribution of a sample 
percentage when replacements are not made is the hypergeo- 
metrical distribution. Thus, if a random sample of N cases is 
taken from a population consisting of S cases and if the per¬ 
centage of cases in the population having attribute A is pi and 
the percentage not having the attribute A is 1 — p! = P 2 , then 
the probability of a sample (drawn without replacements) con¬ 
taining Ni/N A’ s, is given by the formula 

p(l j\ = ( Pl S)\( P ,S)\(S - N) \Nl m 

V^V / (piS - NJKpsS - N + Ni)\S\Ni\(N - N i)! (4j 

This substitution of the hypergeometrical distribution for the 
binomial distribution is the fundamental modification that must 
be made in the previous analysis when sampling is made without 
replacements. 

If both the population and the sample are relatively small, 
then either the hypergeometrical distribution itself may be 
employed to calculate the probabilities of various sample results 
or one of the Pearsonian frequency curves that approximates 
this distribution can be used for this purpose. Calculations of 
this kind are quite complicated, however, and will not be further 
considered here. The interested reader is referred to the Appen¬ 
dix to Chap. IV (page 81) for a discussion of the appropriate 
Pearsonian curve to fit and to pages 127 to 131 for a method of 
measuring probabilities from such a curve. 

When the population and sample are both moderately large, 
the hypergeometrical distribution can be approximated by a 
normal distribution, which again greatly simplifies the analysis. 
Specifically, if the sample N is less than half the population S , 
if 1/VV and hence l /\/S — N are so small as to be negligible, 
and if JVpi and N p 2 are both moderately large, then the hypergeo¬ 
metrical distribution can be approximated by a normal distribu- 
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tion, whose mean is pi and whose standard deviation is given by 
the equation 1 

(t) 

When it is possible to use the normal distribution m place of 
the hypergeometrical distribution, the analysis proceeds in 
almost exactly the same manner as described m the preceding 
sections. The only difference is that, instead of using the value 

^ ig used in its place. It will be noticed that the 
effect of this modification is to reduce the value of a employed, 
since 1 — ^ is necessarily less than 1. This means that con¬ 
fidence intervals, regions of acceptance, and the like, will be 
smaller when account is taken of the size of the population. As 

iv 

S becomes large relative to N, however, the factor 1 — -g becomes 

practically 1 and the size of the population may be disregarded. 
The analysis then becomes that of the previous sections of this 
chapter. 

SAMPLES FROM POPULATIONS IN WHICH Pl IS VERY SMALL 

There are some instances in which the chance of a “favorable” 
occurrence is very small. If a large enough sample is taken, 
however, a few cases will be found. The example most com- 
. monly referred to is that of Borthewitch, who found that for 
10 Prussian army corps the average number of deaths occur¬ 
ring from horse kick during L0 years was .61 per corps per year. 
Another example is the counting of cells in certain biological tests 
and experiments where the chance of occurrence of cells of a 
certain type on a given plate or a given section of a plate is very 

small. 2 ' . 

Whenever the chance of a favorable case is very small, the 

binomial distribution is not well approximated by the normal 
distribution unless N is very large. 3 In these instances the 

1 Cf. Bowley, A. L., Elements of Statistics , 5th ed., pp. 282-284. 

2 cf . Fisher, R. A., Statistical Methods for Research Workers, par. 15. 

s See p. 73. ’ 
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binomial distribution is better approximated by the Poisson 
distribution . 1 This has the form 2 


.rtpvi; 


which my also be written 


JVi ! 


(iVpi) 




( 6 ) 


where P(Nj) represents the probability of IV, cases out of N 
pi is the probability of a favorable ease m = JVpi, and e = 2 718+ 
is the base of Napierian logarithms. For example, if Pl = .003 
and N — 1,000, then m = 3 and the probabilities of 0, 1, 2, 3 
4, . . . favorable occurrences are 3 
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.1680 
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.1008 
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-3 36 
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720 

.0504 

7 

6-3 37 

5,040 

.0216 

8 

-3 38 

6 40,320 

.0081 

9 

3 9 

6 362,880 “ 

.0027 

10 

-3 310 

6 3,628,800 “ • 

0008 


2 t^ 0 J nown as the ' <law of sma11 numbers” or “law of small chances » 
chapter + Tl^ 011 ° f ^ appr0ximation is ^ iven in the Appendix to this 

3 These probabilities may be computed directly by noting that 
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These are graphed in Fig. 62. 



Fig. 62.—The Poisson distribution, tn = 3, pi = .003, N = 1000 [see Eq. (6)]. 

If it is desired to find the probability of N i exceeding or falling 
short of a given value, it is necessary merely to sum the individual 
probabilities in question. For example, if pi = .003 and 
N = 1,000 as in the table above, the probability that two or 
fewer favorable cases will occur is equal to 

.0498 + .1494 + .2240 = .4232. 

The probability that six or more favorable cases will occur equals 
.0504 + .0216 + .0081 + .0027 + .0008 + . . . , which equals 
approximately .0836. Since the table is not completed in this 
direction owing to the small probabilities, the probability in 
question is more accurately computed by subtracting from 1 the 
probability of five or less favorable occurrences. The latter 
probability equals 

.0498 + .1494 + .2240 + .2240 + .1680 + .1008 - .9160. 

Hence a better approximation to the probability of six or more 
favorable cases is 1 — .9160 — .0840. 

The use of the Poisson distribution in testing a hypothesis 
about a population may be illustrated as follows: Suppose it is 

logic e'~ 3 — —3 logic e = — 3(.4343) = —1.3029 

and therefore e“ 3 = .0498, or they may be obtained from Karl Pearson’s 
Tables for Statisticians and Biometricians, Table LI, 
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wished to determine the chance of dying from a new type of 
inoculation, say against smallpox. The existing method, it 
will be assumed, results in the death of not more than .5 per cent 
of those inoculated, and the new method will not be adopted 
widely if its death rate appears to be significantly more than that. 
A thousand test cases are made, and of these seven die. Is it 
reasonable on the basis of these tests to reject the hypothesis that 
the true death rate from the inoculation is more than .5 per cent? 

To answer this question, set the coefficient of risk at approxi¬ 
mately .05, and place the region of rejection all at the upper 
end of the distribution. (For the risk of accepting the hypothesis 
when the death rate is actually higher than .5 per cent is to be 
made a minimum.) It will be noted that the Poisson distribu¬ 
tion, like the binomial distribution, is discrete so that it may not 
be possible to obtain a region of rejection for which the prob¬ 
ability is exactly .05. One coming as close as possible to this 
standard, however, will be adopted. 

If .5 per cent is the true death rate and samples of 1,000 
are taken, the distribution of sample death rates will be approxi¬ 
mated by a Poisson distribution for which m = (1,000) (.005) = 5. 
The individual probabilities for the more likely sample results 
will thus be 1 
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e ~ 5 5 lV i 
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.0067 
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.0337 
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.0842 

3 

.1404 
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.1755 
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.1755 

6 

.1462 
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.1044 

8 

.0653 

9 

.0363 

10 

.0181 

11 

.0082 

12 

.0034 

13 

.0013 

14 

.0005 

15 

.0002 


1 These are taken from Pearson’s Tables for Statisticians and Biometricians, 
Table LI. 
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The total probability of a sample containing 8 or less deaths 
is .9319; thus the probability of a sample containing 9 or more 
deaths is .0681. This is close enough to .05 to take values of 
Mi equal to 9 or more as the desired region of rejection. The 
actual sample has 7 deaths, which is not in the region of rejection; 
therefore the hypothesis of a true death rate of 5 per 1,000 
cannot be rejected. From the figures above it is seen that a 
sample could have as high as 8 deaths without causing the 
hypothesis of a general rate of 5 deaths per 1,000 to be rejected. 

An upper bound for the true death rate that will have a 
probability of .95 of including the true rate may be found as 
follows: Note first that the sample return is 7 out of 1,000. 
Then examine Pearson’s tables to see for what distribution the 
sum of the probabilities of 0 to 6 deaths per 1,000 is just .05. 
This will be found to be the distribution for which m = 11,8. 
Since the sample contains 1,000 cases and m = A'pi, it follows 
that the .95 upper bound for pi, given a sample rate of .007, is 
.0118. It may be said, therefore, that the chances are 95 out of 
100 that the range 0-1.18% covers the true percentage. 

The mean of a Poisson distribution is m, its variance is also 

m, ita 3 , = 1, and its = 4 + 3. When N is so large that 

Npi = m is also large, the Poisson distribution is fairly well 
approximated by the normal curve. This is merely another 
way of saying that even when pi is very small and hence p 2 — Pi 
is relatively large, if N is big enough, the binomial distribution 
approaches the normal distribution. 1 In this instance it makes 
little difference whether the variance of the normal distribution 
is taken as m = Api or as iVpip 2 since p 2 is very close to 1. 

APPENDIX 

Derivation of the Poisson Distribution and Its Properties. 

The Poisson distribution is derived as follows: According to the 
binomial equation the probability of N i cases out of N is 

,\n 

W- FdFr! pfpf 

where pi + p2 = 1 and A r i N 2 = N . 

1 Seep. 73. 

2 See p. 43. 


(i) 
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By dividing numerator and denominator by IV 2 ! = (N — Ah) 1 
and setting p 2 = 1 - Pl and N t - N - JV„ this may be written 


P( Nl ) = N( - N -1) ■ ' ’ (N ~N i + l) 


IV,! 


Pf‘(l 


Pi)"-"' (2) 


Suppose that p, is very small but that N is large enough so that 
IVp, has a value of, say, at least .1. For example, if p, = .003, 
let N = 1,000, so that the value of JVp, is 3. Represent IVp, by 

that m, so p, = — If this value of p, is put in the foregoing 
equation, then 


P(Ni) = ~ 3 ) ’ ' ' (N — N 1 + l) (m 

N 1 ! 


By 


/mV 1 / mV ”* 1 /ON 

W v n ) (3) 

/ 1 \Ni 

distributing throughout the first N 1 factors and by 

( m \ N-Ni 

1 ~ JjJ into its two components, Eq. (3) gives 


Mr 


(4) 


But if N is large, as it must be to make JVpi equal to . 1 or more, 
the value of and - 1 will be negligible. The value of 

(l - will be approximately e~ m , and the value of ^1 - 2*^ ** 

will be practically 1 since *2 [ s practically negligible. Hence 


P( Ni) 

or it may be written, 


m Vi 

N7\ 


e -m 


(5) 


“ V7! exp [“ m ^ (6) 

This is known as the Poisson distribution after the man who 
first developed it, 
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The mean of a Poisson distribution is m, which may be shown 

as follows: 

First note that 


X m “ p l " m) " 1 


m 


since the sum of the total probabilities of a distribution equals 1. 

Next note that the mean of the Poisson distribution is by 
definition 


N 


Mean Ni 


=2 


Ni wi \ exp 


( 8 ) 


Note further that if Ni in the numerator is canceled against the 
N\ factor in JVi! and if m is factored out of nP% Eq. (8) becomes 


N 


Mean N: 


rr 

= m hwi 


m Vl_1 


1) 


exp [ — m] 


(9) 


But if Ni — 1 is set equal to N{ and N — 1 to N', the sum term 
becomes 

N' . 


2 m^ 

W- 


exp [ — m] 


( 10 ) 


which by Eq. (7) equals 1. Hence 

Mean N i = m 

Similarly, d 2 of Ni — m, which may be shown as follows: 

By definition, 

6k = X Nl m expl ~ m] ~ m2 

But iVf = Ni(Ni — 1) + Ni, therefore, 

m Nl 

Nl(Nl ~ 1} jv7T exp [_m] + h Nl 3W! exp [_m] ~ m2 

The first term of the right-hand side of this expression equals 

V mNi ~ 2 r ml 

m Zj(N^W. exv[ ~ m] 
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which as above equals m 2 , and the second term equals m as 
demonstrated in the previous paragraph. Hence 

&ni = m 2 + m — m 2 
or 

= m 

In the same manner it can be shown that 
y 4 = m + 3m 2 

Hence (5i = — and 3 2 = — + 3. 

m m 

These latter formulas suggest that, if N and hence m is large, 
the Poisson distribution approaches the normal form; for (5i then 
equals approximately 0, and (3 2 equals approximately 3. In 
fact, this approach to normality with increasing size of m may 
be mathematically demonstrated in a manner essentially the 
same as that used to demonstrate the approach of the binomial 
distribution to normality. 


( 11 ) 


m, and 



CHAPTER X 


SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 
I. VARIOUS SAMPLING DISTRIBUTIONS 

The foregoing chapter was concerned with attributes that 
are qualitative, such as “defective” versus “nondefectivc,” “for” 
versus “against,” “black” versus “white,” or attributes that 
have only discrete numerical values. This and the following 
chapter will be concerned with attributes that may theoretically 
vary continuously, such as the heights of individuals, yields of 

wheat, and the liked . . , 

The theory of sampling for a continuous variable has been 

most completely worked out for a normal population, %.e., 
a population in which the probability, or relative frequency, of 
a given variable is measured by the normal probability curve. 
Because of this and because hypotheses of normal populations 
arise frequently in practical problems, the present chapter will 
develop the analysis in considerable detail. 

The argument will apply strictly to a hypothetical infinite 
population. It might, for example, apply to the infinite set 
of electric-light bulbs that could be produced by a given process 
if that process were to be used indefinitely without modification. 
It might also apply to the infinite set of crops of a given variety 
that might be yielded by repeated farming of a given type of 
land with a given type of treatment. Again it might apply 
to the infinite set of white adult males living now and in the 
future. In fact, it might apply to the results of many types of 
repetitive or continuous physical, biological, or social processes. 
The argument will also be applicable without serious error to a 
large finite population that is distributed approximately in a 
normal manner, such as the heights of existing white adult males. 
In this instance the population will be presumed to be so large 
relative to the size of the sample that the withdrawal of the 
sample does not materially change the distribution of probabili- 
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ties in the population. Thus, if the probability of a case lying 
between 1.2 and 1.3 is .09, say, before the sampling is begun it 
will be assumed to remain equal to .09 throughout the whole 
of the sampling process. 

The analysis will assume that the process of sampling is a 
random one. This implies, as previously, that the results or 
rather the distribution of the results of repeated sampling from 
the given population can be predicted with reasonable accuracy 
by the use of an appropriate mathematical model. The principal 
task of the analysis will be to develop such a model. 

The problem to be considered will be this: A random sample is 
obtained from a given normal population. Being normal the 
population may be specified by the mean and standard deviation • 
for the ft of all normal populations is 0, and the ft is equal to 3.' 
Hence the problem will be to make inferences about the mean 
and standard deviation of the population from knowledge 
of the mean and standard deviation of the sample. 

To solve this and similar problems, it will be necessary to 
consider what would happen under repeated random sampling 
from the given population. For this purpose, such statistics as 
the mean and variance are selected, and a study is made of how 
these statistics vary from sample to sample. Such a study seeks 
in particular to estimate the relative frequencies with which the 
selected statistics will assume various values among the infinite 
set of samples of given size that might be drawn from the given 
population (with replacements of the samples if the population 
is finite). That is, it seeks to derive the sampling distributions 
of the selected statistics. 

To derive these sampling distributions by actually drawing 
a large number of random samples from the given population 
would be a tedious if not an impossible task in almost all cases. 
Fortunately, the derivation may be undertaken by a theoretical 
process similar to that by which the sampling distribution of 
percentages was derived in the preceding chapters. The steps in 
the theoretical argument are to find a mathematical model that 
appears to reproduce the conditions of sampling and then to 
derive the distributions of the selected statistics for this model 
by means of the probability calculus. 

On the assumption that the process of sampling is a random 
one, it is argued that the actual distributions of statistics among 
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a large set of samples from the given population will tend to 
conform to the theoretical distributions worked out for the given 
mathematical model. The basis for this, it will be recalled, 1 
is intuition and experience with respect to mass phenomena or 
what has been called the “law of large numbers.” The task 
of the present chapter is to set up the appropriate mathematical 
model and to derive the theoretical sampling distributions for 
various statistics. 

DISTRIBUTION OF SAMPLES OF 2 FROM A NORMAL POPULATION 

For simplicity suppose that a sample contains only two cases, 
that is, N — 2. Of course, in practical work, samples rarely 
contain such a small number of cases. Much is to be gained in 
the theoretical analysis, however, by taking samples of 2. For 
the essentials of the argument are the same for these samples 
of 2 as they are for large samples, but the argument is much 
simpler and easier to comprehend. If the argument for samples 
of 2 is clearly understood, the generalization for samples of larger 
size is not very difficult. 

In accordance with the assumption noted above, the popula¬ 
tion is assumed to be so large that the withdrawal of the sample 
does not materially change the distribution of probabilities in 
the population. Hence the withdrawal of two cases from, a 
single population will be essentially the same as withdrawing 
one case from one population and another case from another 
population identical in all respects to the first. Similarly, 
repeated withdrawals of samples of 2 from a single population, 
(with replacements of samples if the population is finite) will be 
essentially the same as repeated withdrawals of samples of 1 
each from two identical populations. This suggests that the 
sampling distributions of means, variances, and other statistics 
of samples of 2 from a normal population may be derived from 
the following mathematical model: 

The model is an arithmetical model instead of an algebraic 
one; this is intended to facilitate the exposition for the reader 
who does not have a ready knowledge of the calculus. However, 
algebraic equations will be given for all results obtained; and, 
for those who have the mathematical training, the translation of 

1 Cf. Smith, J, G., and A. J. Duncan, Elementary Statistics and Applica¬ 
tions, pp. 239-241. See also pp. 27-28. 
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the numerical argument into its algebraic counterpart is given 
in the appendix of this chapter. 

The model supposes two populations identical in all respects; 
each population has 100,000 cases, which, when grouped in class 
intervals of 2, have relative frequencies of a normal frequency 
distribution. These relative frequencies are given in Table 25. 
The mean of each population is 100, and its standard deviation is 
10. Now suppose that each case of population I, like that shown 



Fig. 63.—Probability of a joint sample = 102-104 and Xi = 98-100. 

in Table 25, is combined without restriction with each case of 
population II, also like that shown in Table 25, and suppose that 
the pairs of values so obtained are checked off on a scatter 
diagram such as that illustrated in Fig. 63. For example, if a 
case from population I (call it Xi) belongs to the interval 98- 
and a case from population II (call it X 2 ) belongs to the interval 
102-, then this pair of cases (this sample of 2) will be checked off 
as belonging to the cell abed of Fig. 63. 

If such pairs of cases, or samples of 2, are formed without 
restriction, as is supposed, then according to the multiplication 
theorem for independent probabilities, the probability (relative 
frequency) of a pair of cases belonging to any one cell will be 
equal to the probability (relative frequency) of a case belonging to 
the interval from which the first member of the pair was selected, 
times the probability (relative frequency) of a case belonging to 
the interval from which the second member of the pair was 
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Table 25.—Distribution of Probabilities for a Normal Population 
(X = 100 and d = 10) 


Lower Limits 

Probabilities 1 

of Class 

(Frequencies "Relative 

Intervals 

to the Total of 100,000) 

58- 

.00002 

60- 

.00004 

62- 

.00009 

64- 

.00018 

66- 

.00035 

68- 

.00066 

70- 

.00121 

72- 

.00210 

74- 

.00354 

76- 

.00570 

78- 

.00885 

80- 

.01318 

82- 

.01887 

84- 

.02596 

86- 

.03431 

88— 

.04359 

.90- 

.05320 

92- 

.06239 

94- 

.07033 

96- 

.07616 

98- 

.07926 

100- 

.07926 

102- 

.07616 

104- 

.07033 

106- 

.06239 

108- 

.05320 

110- 

.04359 

112- 

.03431 

114- 

.02596 

116- 

.01887 

118- 

.01318 

120- 

.00885 

122- 

.00570 

124- 

.00354 

126- 

.00210 

128- 

.00121 

130- 

.00066 

132- 

.00035 

134- 

.00018 

136- 

.00009 

138- 

.00004 

140- 

.00002 


i These probabilities are derived by successive subtraction of the .2 intervals in the five- 
place normal probability area table in F. C. Mills and D. II. Davenport, A Manual of 
Problems and Tables in Statistics, pp. 199-203. 
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selected. For example, if the probability of a case belonging to 
the interval 102- in population II is .07616 and the probability 
of a case belonging to the interval 98- in population I is .07926, 
then the probability of a pair of cases belonging to the cell 98- 
for Xi and 102- for X 2 , that is, cell abed of Fig. 63, is 

(.07926) (.07616) = .006036. 

The probability of any pair of cases among the set of all possible 
pairs of cases may thus be calculated by simple multiplication 
of two elementary probabilities. This has been done for the 
more probable pairs of cases, and the results are presented in 
Fig. 64. 

It may appropriately be argued that this distribution of 
pairs of cases is a good prediction of the results that would 
be obtained by drawing one item at random from, population I 
and another at random from population II, recording the results, 
and then replacing the cases and repeating the process. For 
if the process of selection is a random one, each case in each 
population will be drawn just about as often as every other 
case; and if the selection of the first case does not influence the 
selection of the second, then no particular pair of cases will 
tend to occur any more frequently than any other pair of cases. 
As a consequence, the relative frequencies of various types of 
samples will, in repeated sampling, be approximately the same 
as the relative frequencies of various types of combinations among 
the set of all possible combinations of one item each from the two 
populations. 

According to the argument presented above, the latter will 
also be a good approximation to the relative frequencies of 
various types of samples of 2 from a single population. For 
in this case, the original population can be viewed as population I, 
and the population remaining after the first case has been drawn 
can be viewed as population II, the form of the latter being 
practically the same as that of population I. Relative fre¬ 
quencies in Fig. 64 therefore constitute a good prediction of the 
distribution of a very large number of samples of 2 from a normal 
population whose mean is 100 and whose standard deviation is 10. 

Properties of the Distribution. Its Circular Symmetry . The 
most striking feature of the set of samples represented in Fig. 64 
is the symmetrical distribution of the samples. They tend to 


















Fig. 64. —Probability distribution of all samples (JV = 2) from Populal 
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cluster most closely around the point whose coordinates are both 
equal to the mean value, 100, and to thin out evenly in all direc¬ 
tions from that point. In fact, it will be noted that cells lying 
equally distant from this central point have approximately equal 
probabilities. 

If the cells had been made smaller, this circular symmetry 
would have been more evident. Careful study also shows that 
the probabilities in each row and each column tend to conform to 
a normal frequency distribution with the same mean and same 
standard deviation as the original population. 

Figure 64 is typical of a large set of samples from a normal 
population, whatever the size of the sample. In every case 
the various samples cluster most closely around the point whose 
coordinates are all equal to the mean of the population, and in 
every case the distribution of samples around this point conforms 
to a symmetrically circular (or spherical) pattern. 

Geometrical Measurement of the Mean of a Sarnple. Figure 64 
designates a sample by indicating the interval to which case I 
belongs and the interval to which case II belongs. It may also 
be used to find the interval to which the mean of any sample 
belongs. This follows from the geometrical properties of the 
figure. 

By definition the mean of any sample of two cases,, is 
A “ 2 ' “ 2^2 

For a given value of X, this is the equation of a line in Fig. 64 
that runs through the point X 2 = 2X on the X 2 -axis and the 
point Xi = 2X on the Xi-axis. All XiX 2 sample combinations 
that lie on this line have the given mean value. For example, in 
Fig. 64 any sample point lying on line AB, such as points (100,90), 
(98,92), (94,96), (88,102), and (102,88), has a mean of 95. 
Coordinates are given in order X\X 2 . In general, the plane of 
Fig. 64 may be covered with a set of parallel lines, such as line AB, 
any one of which is the locus of all sample combinations having 
the same mean. Geometrically, the slope of all these parallel 
lines is such that the increase in X x is matched by the decrease in 
X 2 , or vice versa, so that the mean of the two remains the same. 

The parallel lines that could be drawn perpendicular to lines like 
AB (for example, like GH) represent samples such that, whenever 
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Xi increases by a given quantity, X 2 increases also by the same 
given quantity. Thus, on line GH, if Xi is 94, X 2 is 86, whereas, 
if Xi is 96, X 2 is 88—as Xi has increased by 2, X 2 has also increased 
by 2. Of this latter group of sample combinations one set is of 
special interest, viz., that for which Xi and X 2 have the same value, 
or, algebraically, Xi = X 2 . In Fig. 64 all samples lying on line CD } 
which passes through the origin and bisects the angle between 
the axes, have values of X x = X 2 . In fact, line CD is the locus 
of all sample combinations in which Xi = X 2 . But in addition, 
when Xi = X 2 , it also follows that X = Xi = X 2 . As a conse¬ 
quence, every point on line CD has a pair of equal coordinates 
whose value is the mean of those samples lying on the line 
perpendicular to CD through this point. This follows from the 
fact that on a line perpendicular to CD, such as AB, the mean is 
constant. 

As a result of these geometrical properties, the mean of any 
sample in the diagram can be found immediately as follows: 
First locate the sample point on the diagram. Then proceed 
up or down from this point along a line perpendicular to the 
line CD until the intersection with CD is reached. The mean 
of the sample is either the Xi or the X 2 coordinate (for Xi = X 2 ) 
of this intersection point. For example, the sample point 
(84,106) lies on a line that intersects CD in a point whose X 2 
coordinate is 95. Hence the mean of this sample is 95. 

These properties of Fig. 64 also make it possible to find the 
interval to which any sample mean belongs. Thus suppose 
that intervals are marked off on the X 2 - (or Xi-) axis, the cor¬ 
responding points on the line CD are found by vertical (or 
horizontal) projection, and lines perpendicular to CD are drawn 
through these points. Then to find the interval to which the 
mean of a sample belongs it is necessary to find merely the pair 
of lines between which the sample lies. For example, the 
line perpendicular to CD through the point on CD whose X 2 
coordinate is 95 is the line AB, and the line perpendicular to 
CD through the point whose X 2 coordinate is 97 is the line A'B'. 
Any sample lying between these two lines will have a mean lying 
between 95 and 97. This geometrical property of Fig. 64 will be 
found very useful in deriving the distribution of sample means. 

Geometrical Measurement of the Variance and Standard Devia¬ 
tion of a Sample. Figure 64 may also be used to find the interval 
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within which the variance and standard deviation of a sample 
lies. First note that by definition the variance of a sample of two 

itGmS " , _ (Xx - XY + (X, - * )» 


With reference to Fig. 64 this may be interpreted as one-half 
the square of the distance 1 from the point whose sample values are 
both equal to X to the point whose sample values are Xi and X 2 . 
For example, one-half the square of the distance from the 
point (102,88) to the point (95,95) is the variance of the sample 
102,88. Likewise, one-half the square of the distance from the 
point (108,82) to the point (95,95) is the variance of the sample 
108,82, a sample that has the same mean as 102,88 but a differ¬ 
ent’ variance. Similarly, one-half the square of the distance 
from the point (102,92) to the point (97,97) is the variance. 
of the sample 102,92, a sample whose mean and variance both 
are different from those of the sample 102,88. 

It follows from this that all sample points equally distant 
from the points representing their means have the same variances. 
Since the line CD is the locus of the points representing the 
means of the various samples, 2 it may be concluded that all 
sample points equally distant from the line CD (in a perpen¬ 
dicular direction) have the same variances. Thus, if intervals 
are marked off on some line perpendicular to CD and if lines aie 
drawn parallel to CD through the end points of these intervals, 
the interval to which the variance of any sample belongs may 
be found by merely noting the pair of_ lines between which it 
lies. For example, the line EF is 2 \/2 units distant from the 
line CDF All samples on this line therefore have variances 
equal to i(2 \/2 ) 2 = 4. The line GH is 4 y/2 units distant from 
line CD, and all samples on this line have variances equal to 16. 
The sample 98,104 lies between these two lines and hence has a 
variance lying between 4 and 16. 

Since the standard deviation of a sample is equal to the 
square root of the variance, the above method of finding the 
i It will be recalled that the distance between points (Xi,Yi) and (X 2 ,F 2 ) 


is 


V(X 


X 2 )2 + (lb ~ F 2 ) 2 . 

2 See pp. 225-226. 

3 Each side of a cell is equal to 2 units; therefore, the diagonal is equal to 

2 V2 units. property or 

MNtElt INSmilfE Of TtEHHOLaBl 
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variance of a sample also automatically gives its standard 
devmrion Thus the lines EF and OH include samples whose 
standard deviations lie between 2 and 4. 

This geometrical method of finding the intervals to which 
the variance and standard deviation of a sample belong is very 
helpful m deriving the sampling distributions of these two 
statistics. 


Geometric Measurement of the Sample Statistic ^ / ' N ^ ~ 

As will be pointed out later, when the standard deviation of a 



population is not known and the sample 

Vn (x - x). 


id Diiiaii, wie statistic 


is the best to employ in testing hypotheses and 


determining confidence limits for the mean of the population. 
In this statistic, X represents the mean of the sample, X is the 
hypothetical value for the mean of the population, and d is the 
maximum-likelihood estimate of the standard deviation of 
the population. The relationship between the value of a and 


the standard deviation of the sample is <r = 


- The 

derivation of this maximum-likelihood value is given in the next 
chapter. 


In Fig. 64, the statistic -X) ma y ^ j n ^ er p rete( j 

geometrically as follows: First, in Fig. 65a and 655, note that 
VN (X - X), which for samples of 2 becomes (X - X), is 
equal to the distance from the point h , whose coordinates are 
both equal to X, to the point m, whose coordinates are both 
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equal to X.* Again note that, for samples of 2, i = a y/2 and 
that this latter is the perpendicular distance from the sample point 
(X, X 2 ) to the mean point (X,X). 1 Hence, if the sample point 

■_ .Vx(x-X) 

lies above the line CD, as m Fig. 65a, the statistic - 

is but the cotangent of the angle 1 that the line connecting that 
point with the central point (X,X) makes with CD. If the sample 

point lies below CD, as in Fig. 65b, then the statistic - . - 

equals minus the cotangent of the angle that the line connecting 
the sample point with the central point (X,X) makes with 

If lines are thus drawn through the central point (X,A) so 
that the cotangents of the angles they make with CD form 
regular intervals of, say .2, then to find the interval within 

which the statistic lies for any sample it is neces¬ 

sary merely to note between what pair of lines the sample point 
lies. For example, the line LL in Fig. 64 makes an angle with 
CD the cotangent of which is 2, and the line KK makes an angle, 
the cotangent of which is 1.8. Hence, the sample 120,106, which 
lies between these two lines and above CD, has a value for the 

statistic that lies between 1.8 and 2 (its exact 

value is 1.857). Similarly, the lines L'L' and K'K' form angles 
with CD of which the cotangents are -2 and -1.8. Hence 
the sample 120,106, which lies between these lines and below 

CD, has a value for that lies between 1.8 and 2. 

SAMPLING DISTRIBUTION OF THE MEAN 

The foregoing properties of the distribution of samples of 2 
from a normal population offer a ready means of obtaining the 
sampling distributions of various sample statistics. Consider 
first the sampling distribution of the mean. 

By definition, the sampling distribution of sample means 
is a description of the relative frequencies with which samples 

* CD makes a 45-degree angle with the Xi-axis, since it is the locus of all 
points whose two coordinates are identical. Hence in the right triangle 
him the distance mh — y/2 times the distance ml. 

1 All angles are viewed as being measured counterclockwise from CD. 





230 ELEMENTARY THEORY OF RANDOM SAMPLING 

assume various mean values. In any concrete problem, such as 
is involved here, it will consist of a list of intervals together with 
the relative frequencies of samples whose means fall in those 
intervals. As pointed out above, it is possible from Fig. 64 
to find the interval within which the mean of any sample of 2 
lies. Hence to find the sampling distribution of the mean of 
two cases it is necessary only to lay off a given range of intervals 
for the mean and then determine from Fig. 64 the relative 
frequencies of samples lying in the various intervals. The exact 
procedure may be illustrated by the following example: 

Suppose it is desired to find the relative frequency or prob- 
bility of a mean lying between 95 and 97 (i.e., from 95 up to 
but not including 97). To do this, first proceed from the points 
95 and 97 on the X 2 -axis to the points vertically above on the 
CD line. Draw lines through these points perpendicular to CD. 
These are the lines AB and A'B' of Fig. 64. The relative fre¬ 
quency , or probability, of samples having mean values lying 
between 95 and 97 is equal to the relative frequency, or proba¬ 
bility, of samples lying between the lines AB and A'B'. This 
is equal to the probabilities of all cells completely included 
between these two lines; i.e., it is equal to 

535 .6 494.5 , 421.7 

100,000 + 100,000 + 100,000 + ‘ ‘ ’ 

plus the prorated share of the probability of those cells only partly 
included between these two lines, that is, 

1 557.4 + 1 475.2 ^ 1 421.7 ,1 494 5 

2 100,000 T 2 100,000 ^ 2 ' 100,000 + 2 ' 10p00 + ' ' ‘ ‘ 

9 570 8 

The grand total is j^ 0Q( y and this is accordingly the probability 

of a sample having a mean lying between 95 and 97. 

When the procedure just described is applied to finding the 
probabilities of sample means lying within each of the class 
intervals 75- 77- etc., the results are those shown in Table 26. 
From this table and from the accompanying chart it can be seen 
that the mean of this distribution is the same as that of the mean 
of the population and that its variance is less. As a matter of 
fact, numerical calculation shows that the variance of this 
sampling distribution of the mean of two items is one-half the 
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variance of the population. Furthermore, the figure suggests 
that the sampling distribution of the mean has the form of a 
normal frequency curve. 

These indications of the numerical analysis are verified by 
algebraic analysis. 1 This shows that in general the sampling 



Fig. 66.—Comparison of a normal population with the sampling distribution 
of means (N = 2). Data from Table 26. 


distribution of the mean is normal in form, that its mean is the 
mean of the population from which the samples have been 
drawn, and that its variance is 1/iVth the variance of the popula¬ 
tion. The algebraic equation for the sampling distribution 
of the mean is thus 


dP(X) = 


1 


where 




d * -\/2tt 

d 

Vn 


exp 


i?- x r 

2di 


dX 


( 1 ) 


1 See Appendix to this chapter (p. 251). 
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Table 26.— Comparison of the Sampling Distribution of the Mean 
with the Distribution of the Population 


Lower limits 
of class 
intervals 

Probabilities of 

Sample mean 

Population 

(1) 

(2) 

(3) 

75- 

.00036 

.00451 

77- 

.00093 

.00714 

79- 

.00215 

.01086 

81- 

.00456 

.01585 

83- 

.00895 

.02224 

85- 

.01619 

.02999 

87- 

.02705 

.03887 

89- 

.04177 

.04839 

91- 

.05960 

.05790 

93- 

.07856 

.06658 

95- 

.09571 

.07355 

97- 

.10773 

.07808 

99- 

.11206 

.07968 

101- 

.10773 

.07808 

103- 

.09571 

.07335 

105- 

.07856 

.06658 

107- 

.05960 

.05790 

109- 

.04177 

.04839 

111- 

.02705 

.03887 

113- 

.01619 

.02999 

115- 

.00895 

.02224 

117- 

.00456 

.01585 

119- 

.00215 

.01086 

121- ' 

.00093 

.00714 

123- 

.00036 

.00451 


SAMPLING DISTRIBUTIONS OF THE VARIANCE 
AND STANDARD DEVIATION 

The sampling distribution of sample variances gives the 
relative frequencies, or probabilities, of samples having various 
values for their variances. In numerical form, it lists certain 
intervals and records the relative frequencies of samples whose 
variances fall in these intervals. 

The geometrical properties of Fig. 64 permit as ready a deriva¬ 
tion of the distribution of the variances of samples of 2, as it 
did of the distribution of the means of samples of 2. For, as 
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pointed out above, 1 all sample points equally distant from the line 
CD (in a perpendicular direction) have the same variances. 
Thus a set of lines can be drawn parallel to the line CD and by 
computing the relative frequency, or probability, of the samples 
falling between various pairs of lines the relative frequency, or 
probability, of a sample having a given variance can be deter¬ 
mined. The procedure may be illustrated by an example. 

The sample 98,94 has a mean of 96 and a variance of 4. The 
point representing this sample in Fig. 64 lies on the line EF 
running parallel to line CD. According to the previous para¬ 
graph, all other sample points lying on the line EF have the same 
variance. The sample 100,92 has a mean of 96 and a variance 
of 16. The point representing this sample in Fig. 64 lies on the 
line GH, also running parallel to CD. All sample points lying 
on line GH therefore have a variance of 16. It also follows that 
all sample points lying between lines EF and GH have variances 
lying between 4 and 16. Furthermore, since all points on the 
line E'F' of Fig. 64 are the same distance from line CD as those 
on line EF , they also have sample variances of 4, and likewise 
all points on the line G'H' have sample variances of 16. Hence 
the total set of samples whose variances lie between 4 and 16 
consists of all the samples lying between the lines EF and GH 
on the one hand and the lines E'F'. and G'H' on the other harid. 
The relative frequency, or probability, of a sample having a 
variance between 4 and 16 is therefore the relative frequency, or 
probability, of a sample lying between EF and GII, plus the 
relative frequency, or probability, of a sample lying between 
E'F' and G'H'. As in the case of the means, this total probabil¬ 
ity may be computed directly from Fig. 64 by adding the prob¬ 
abilities of all the cells and sections of cells included between 
these two pairs of lines. The result in this particular case is 
found to be 0.20510. It is suggested that the reader check this 
by carrying out the calculations himself. 

By repeated applications of the foregoing procedure, the 
probabilities of samples having variances lying between other 
class limits may readily be computed. The results are summa¬ 
rized in Table 27 and pictured graphically in Fig. 67. This 
represents the sampling distribution of the variances of samples 
of 2. Owing to the small size of the sample, the shape of the 

* See pp. 227 -228, 
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distribution is unusual. A more common shape is shown by the 
distribution of samples of 11, which also is illustrated in Fig. 67. 


Samples of two 


Distribution of variances of samples 
of two and samples of eleven 


■Samples of eleven 


Fig. 67. Effect of size of sample on the sampling distribution of variances. 

Algebraic analysis 1 shows that the sampling distribution of 
variances of samples of N has the following equation: 


dP{u 2 ) = 


N -1 N — 3 

N 2 (<r 2 ) 2 


V<r 2 " 

~ exp -W 

N — 1 L J 


tt da 2 


For N greater than 2, this represents a curve that starts at 0, 
N — 3 

rises to a peak at <r 2 = —— d 2 , and approaches 0 again as <r 2 

goes to infinity. For small values of N the curve is thus very 
skewed. For large values of N, however, the curve is more 
symmetrical and almost normal in form, the mean of the curve 
being approximately the variance of the population d 2 and its 


standard deviation being d 2 

For small samples, the sampling distribution of the variance 
is thus nonnormal, and use cannot be made of the normal 
probability tables in testing hypotheses and determining con¬ 
fidence limits. It can be shown, however, that if the unit 

of measurement is taken as times the variance of the population 


(that is, Pj, 


then the distribution takes on the form of the x 2 


1 See Appendix to this chapter (p. 263). 
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Table 27.—Sampling Distribution of the Variance and Standard 
Deviation of Samples of 2 from a Normal Population Whose 

d = 10 

(N = 2 ) 


Values of 


.--— — 

— 

Probability 

<r 

<r 2 


0- 

0- 

.22193 

2- 

4- 

.20510 

4- 

16- 

.17515 

6- 

36- 

.13820 

8- 

64- 

.10078 

10- 

100- 

.06790 

12- 

144- 

.04228 

14- 

196- 

.02432 

16- 

256- 

.01291 

18- 

324- 

.00632 

20- 

400- 

.00285 

22- 

484- 

.00118 

24- 

576- 

.00042 


Table 28.—Sampling Distribution of Variances and Standard 
Deviations of Samples of 11 from a Normal Population whose 

d = 10 

(N = 11 ) 


Values of 

Probability 

<r 


0- 

0- 

.00008 

2- 

4- 

.00248 

4- 

16- 

.04847 

6- 

36- 

.22713 

8- 

64- 

.36406 

10- 

100- 

.25270 

12- 

144- 

.08718 

14- 

196- 

.01592 

16- 

256- 

.00095 

18- 

324- 

.00005 



.99902 
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distribution.^ More ^specifically, it can be shown that the 

quantity = -^r has a sampling distribution that is of 

the form of the x 2 distribution with n in the x 2 equation equal 
to N 1. Tables of the x 2 distribution can therefore be used 
m computing probabilities of various sample variances. This 
use of the x 2 tables will be explained more fully below. 1 
' sa “P lin g distribution of the standard deviation of samples 
of 2 is found from Table 27 by taking the intervals 0-, 2-, 4- 6- 
8-, the limits being the square root of the limits adopted for the 
variance The result is shown in Fig. 68. The figure also gives 
tile distribution of the standard deviation for samples of 11. 


QV.*t /o 

Samples f”"j Samples of eleven 
of two/ | 

I 20 o 5 f j '&% 

2 r 7 -l "j Distribution of standard 

1 12 ^% 0 l deviations of samples of 

\/3.8% ! two and samples of eleven 



P(cr) 


/o\ 

1_ t—r— 

0 2 4 6 8 10 12 14 16 18 20 22 

a 

Fig. 08,-Effect of size of sample on sampling distribution of the standard 

deviation. 

The equation for the sampling distribution of the standard 
deviation is 

N-l 


dP(a) = — 


N 2 (r N - 2 


N -3 
2 2 


tV 3 > 


exp 


d v - 


Na* 

d 2 


d(T 


( 3 ) 


This equation is not used in practical analysis since it is easier 
to work with the variance a* and to use tables of the x 2 distribu- 
tion as just explained. 


SAMPLING DISTRIBUTION OF 


The sampling distribution of giveg the reIatiye 

frequencies with which samples take on various values of this 
1 See pp. 284-289. 
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statistic. In numerical cases, it lists intervals of this statistic 
and gives the relative frequencies of samples belonging to these 
intervals. 


It was pointed out above that if a line is drawn through a 
sample point connecting it with the central point (X,X) the 

value of the statistic NHSKzXl f or the samp i e j s equal to tho 


cotangent of the angle that this line makes with CD if the 
sample point lies above CD or to minus the cotangent of this 
angle if the sample point lies below CD, all angles to be read 
counterclockwise. 


The relative frequency, or probability, of samples having 
, , a/N (X - X) . . , 

values oi - -lying between certain limits is accord¬ 


ingly the relative frequency with which samples fall between 
the radii through the point (X,X) whose cotangents are equal 
to these limits. For example, in Fig. 64, the cotangent of the 
angle that LL makes with CD is 2.0, and the cotangent of the 
angle that KK makes with CD is 1.8; also, the cotangents of 
the angles thatZ/Z/ and K*K! make with CD are -2.0 and - 1.8, 


respectively. Hence, the samples for which —— ^ lies 

a 

between 1.8 and 2.0 are the samples included between the 
lines LL and KK and lying above CD and the samples included 
between the lines L'L' saidK'K' and lying below CD . Similarly, 

the samples for which — V li es between -1.8 and -2.0 


are the samples included between LL and KK and lying below 
CD and the samples included between Z/Z/ and K'K' and lying 
above CD. Since various sample points are distributed with a 
circular symmetry about the point (X,X),i it follows that the 
relative frequency of samples included between LL and KK and 
lyiftg above CD is proportional to the angle between these two lines. 
More specifically, it is equal to the ratio that this angle bears to 
360 deg. Since in Fig. 64 angle LXD = 26.56 deg and angle 
KXD = 29.05 deg and hence the difference equals 2.49 deg, it 
follows that the relative frequency, or probability, of samples 
included between the lines LL and KK and lying above CD is 


1 See pp. 224-225. 
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Owing to the symmetry of the prob- 


2 4Q 

equal to 7^ = .0069. 

abilities represented by Fig. 64 this is also the probability of a 

sample included between L'V and K'K! and lying ^below CD. 

, , . . VN (X - X) 

Hence the probability of a sample having a value ot —- - -- 


between 1.8 and 2.0 is .0138; similarly, the probability of a sample 
having a value of that statistic between —1.8 and —2.0 is .0138. 



The sampling distribution 


Vn (x - X) 

a 


can therefore be 


readily obtained for samples of 2 as follows: Find the angles for 
which the cotangents are, say, 0.0, 0.05, 0.15, 0.25, 0.35, etc.; 
take the successive differences between these angles, and double 
the results; finally divide these results by 360. This quotient, 
for each successive interval, - will be the probability of a sample 


having a value of —— lying between 0 and 0.05, between 

0.05 and 0.15, between 0.15 and 0.25, between 0.25 and 0.35, etc. 
Owing to the symmetry of the probabilities represented in Fig. 64, 


the probabilities of negative values of 


Vn (x - x) 


are the 
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Viv (X 

- X) _ 

Table 29.- 

—Sampling 

Distribution 

of the Statistic v 

FOR 


Samples 

of Two FROM 

a Normal 

Population 




(X = 100 and 6 = 10) 



' CD 

(2) 

(3) 

(4) 

(5) 

(6) 

Cotangent of 
angle with 
CD line 

Angle with 
CD line 

Angle inter¬ 
vals, e.ff., 

liXk 

Twice angle 
intervals 
divided by 
360 deg. 

Class inter¬ 
vals in the P 

statistic 

robabilities 

- 05 

92,862j 




.03180 

.00 

.05 

90.000 > 
87.138) 

5.724 

.03180 

- .05- 

.03149 

5.669 

.03149 

.05- 

. 15 

81.469 

. 03059 

5.506 

.03059 

.15- 

.25 

75.963 

5.253 

.02918 

.25- 

.02918 

.35 

70.710 

4.939 

.02744 

.35- 

.02744 

.45 

65.771 

4.580 

.02544 

.45- 

.02544 

.55 

61.191 

4.215 

.02342 

.55- 

.02342 

.65 

56.976 

3.846 

.02137 

.65- 

.02137 

.75 

53.130 

3.493 

.01940 

.75- 

.01940 

.85 

49.637 

3.167 

.01759 

.85- 

.01759 

.95 

46.470 

2.867 

.01593 

.95- 

.01593 

1.05 

43.603 

2.593 

.01440 

1.05- 

.01440 

1.15 

41.010 

2.350 

.01306 

1.15- 

.01306 

1.25 

38.660 

2.132 

.01184 

1.25- 

.01184 

1.35 

36.528 

1.936 

.01076 

1.35- 

.01076 

1,45 

34.592 

1.763 

.00979 

1.45- 

.00979 

1.55 

32.829 

1.611 

.00895 

1.55- 

.00895 

1.65 

31,218 

1.475 

.00819 

1.65- 

.00819 

1.75 

29.743 

1,355 

.00753 

1.75- 

.00753 

1.85 

28.388 

1.238 

.00688 

1.85- 

.00688 

1.95 

27.150 

1.150 

.00639 

1.95- 

.00639 

2,05 

26.000 

1,056 

.00587 

2.05- 

.00587 

2.15 

24.944 

.980 

.00544 

2.15- 

.00544 

2.25 

23.964 

.914 

.00508 

2.25- 

.00508 

2.35 

23.050 




00172 



1 .850 

I .00472 

2.35- 1 
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Table 29.—Sampling Distribution of the Statistic ^ ~ ^ 
Samples of Two from a Normal Population.—( Continued ) 


(i) 


(2) 


(3) 


(5) 


( 6 ) 


Cotangent of 
angle with 
CD line 

2.45 

2.55 

2.65 

2.75 

2.85 

2.95 

3.05 

3.15 

3.25 

3.35 

3.45 

3.55 

3.65 

3.75 

3.85 

3.95 

4.05 

4.15 

4.25 

4.35 

4.45 

4.55 

4.65 

4.75 


Angle with 
CD line 

22.200 
21.415 
20.671 
19.980 
19.338 
18.724 
18.156 
17.610 
17.105 
16.619 
16.165 
15.733 
15.320 
14.931 
14.561 
14.207 
13.870 
13.549 
13.241 
12.946 
12.665 
12.395 
12.138 
11.888 


Angle inter¬ 
vals, e.g., 

Z LXK 

.785 
.744 
.691 
.642 
.614 
.568 
.546 
.505 
.486 
.454 
.432 
.413 
.389 
.370 
.354 
. 339 
.321 
.308 
.295 
.281 
.270 
.257 
.250 


(4) 

Twice angle 
intervals 
divided by 
360 deg. 

.00436 
.00413 
.00384 
.00357 
.00341 
.00316 
.00303 
.00280 
.00270 
.00252 
.00240 I 
.00229 
.00216 
.00206 
.00197 
”00188 
.00178 
.00171 
.00164 
.00156 
.00150 
.00143 
.00139 


Class inter¬ 
vals in the 
statistic 

2.45- 

2.55- 

2.65- 
2.75- 
2.85- 
2.95- 
3.05- 
3.15- 
3.25- 
3.35- 

3.45- 

3.55- 

3.65- 
3.75- 
3.85- 
3.95- 
4.05- 
4.15— 
4.25- 
4.35- 

4.45- 

4.55- 

4.65- 


Probabilities 


.00436 
.00413 
.00384 
.00357 
.00341 
.00316 
.00303 
.00280 
.00270 
.00252 
.00240 
.00229 
.00216 
.00206 
.00197 
.00188 
.00178 
.00171 
.00164 
.00156 
.00150 
.00143 
.00139 
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warn r as the probabilities of positive values. The sampling 

distribution of is thus symmetrical about zero. 

The calculations here indicated are carried out in Table 29 and 
the results pictured graphically in Fig. 69. 

Algebraic analysis shows that, in general, the sampling dis- 

tribution of the statistic is of the form of the t 

(or Student’s) distribution, with the n in the equation equal to 
A - 1. For samples of any size the equation for the t distribu¬ 
tion is as follows: 1 


where n = N — l. . . . . 

As might have been inferred from the numerical analysis, 

the sampling distribution of -A i S independent of the 

hypothetical values assumed for the mean or standard deviation 
of the population and varies only with N , the size of the sample 
For large values of N the curve is approximately normal with 
a mean of 0 and a variance approximately equal to 1. In these 
cases the normal probability table can be used m place of the 
t table. 



SAMPLING DISTRIBUTIONS OF OTHER STATISTICS 


In estimating the properties of a 
sampling distributions of the mean, 


normal population the 
and the 


<7 


variance (or standard deviation) are the most important, b or a 
normal distribution is characterized by its mean and standard 
deviation and these are best estimated by the above sampling 

distributions. , 

It is also possible to estimate the mean of the population 
from the median of a sample, and in the absence of knowledge 
of the sample mean this would be the next best statistic. As m 
the case of the mean, the sampling distribution of the median 


1 See Appendix to this chapter (p. 266). 
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is normal, and its mean is the mean (= median) of the popula¬ 
tion. The standard error of the median, however, equals 
1.2533 , . , , 

y/ff ^ 111011 ls a k Ql R li times as large as the standard error 

of the mean. It is this greater sampling error that makes the 
median less “efficient” than the mean in setting up confidence 
limits for the mean (= median) of the population. For the 
confidence interval derived from the median will, for a given 
confidence coefficient, be 1£ times larger than that derived from 
the mean and will therefore not give as good an estimate of the 
population mean, or median. 

In instances where time and economy of effort are important, 
it is also possible to estimate the variance (or standard deviation) 
of a normal population from the sample range. 1 It has been 
found possible to derive the mean range, the standard error of 
the range, and the upper and lower .001, .005, .010, .025, .050, 
and .100 points of the sampling distribution of the range for 
samples of 2 to 20 cases. 2 These data are reproduced in Table 
XIII of the Appendix. The unit for the table is the standard 
deviation of the population, and the table can thus be used to 
estimate the population standard deviation from the sample 
mean. The procedure is discussed on pages 294-296. 

The sample statistics \/0i and 0 2 are important as measures 
of departure from normality. If the population is normal, the 
sampling distributions of these statistics tend to normality with 
increasing size of the sample. Their means are 0 and 3, and 
their standard deviations are approximately y/§/N and -\/24 /N, 
respectively. Hence, if a sample is large, departures from 

normality may be tested by treating and A ~ 3 as 

vtyiv VWN 

normally distributed variates. 

Because of the complexity of the exact formulas for the 
means and standard deviations of the sampling distributions 
of VPi and /3 2 , R. A. Fisher 3 suggests the use of certain “k 

1 The range is the difference between the largest and smallest cases in a 
sample. 

2 L. H. C. Tippett, E. S. Pearson, and H. O. Hartley have been mainly 
responsible for deriving these data. See footnote to Table XIII in the 
Appendix. 

3 Statistical Methods for Research Workers, Appendix, Chap. Ill, 
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statistics” in place of the moments of a sample distribution and 
certain “g statistics” in place of the sample /3 statistics. The * 
statistics 1 are defined as follows: 


7 Tr3 

ks " W- 1 '(-V - 2) ' 

N 

ki ~ •>- - InA’ - 2) (,2V — 3) . ' _ 

j^(2V — l)Sa; 4 — 3 (JV ^ -- (Sir 2 ) 2 

The 3 statistics bear the same relationship to the k statistics 
as the 0 statistics do to the sample moments, viz., 


9 


2 

1 


ki 

k} 


&4 

M 


For large samples the sampling distributions of the g statistics 
tend to be normal in form, with means of zero and standard 
deviations equal exactly to 


6 N{ N - 1 ) _ 

~ (..Y - 2 i(,V • • 1 )(N - 3) 

24 N(N - l ) 2 

~ iA" - 3iuV -2AN- 3)i.Y - 5) 

The use of the sampling distribution of /3i and 02 or 31 and 32 
to test departures from normality will be discussed m the next 

chapter. • , 

For small samples, the sampling distributions of V0i and fh 
have been approximated by “fitting” Pearsonian curves haying 
the same moments as those of the exact sampling distributions. 
For samples of 25 to 100, the 5 per cent and 10 per cent 
limits of the sampling distribution of Vhave been determined 

1 The mean values of the k’s are the “eumulauls” of the population. 
Cf. pp. 83-84. 
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approximately by P. Williams. 1 These 
Table X of the Appendix. 2 


are reproduced in 


It is interesting to compare the limits of this table with those 
obtained by the assumption that V/Si has a normal sampling dis¬ 
tribution with a mean of zero and a standard deviation of V&/N 
For example, for samples of 100, the 5 per cent point given by 
the table is 0.389, while the assumption of a normal sampling 
distribution gives 0.403. This suggests that for samples of 

100 or more a normal sampling distribution gives fairly good 

i ©suits. 


Approximate values for the upper and lower 1 per cent and 
5 per cent points of the sampling distribution of /3 2 have also 
been computed 3 and are reproduced in Table XI of the Appendix 4 
By comparison, the. 5 per cent points derived from the assump¬ 
tion that 02 is normally distributed with a mean of 3 and a 
standard deviation of VWN are 2.19 and 3.81 for samples of 
100 and 2.64 and 3.36 for samples of S00. The table values 
are 2.35 and 3.77 for samples of 100 and 2.57 and 3.37 for samples 
of 500, which shows that for samples of 500 or more the assump¬ 
tion of a normal sampling distribution gives results that would 
appear to be fairly reliable. For samples of 100, the assumption 
of a normal sampling distribution gives a lower limit that is 
somewhat below that given by the table. 

For practical use, Tables X and XI of the Appendix have 
been put m chart form. 5 This permits ready interpolation for 
values not listed in the tables. 

To test departure from normality with respect to kurtosis 
tables have also been constructed for the sampling distribution 
of the statistic 


= A.D. (calculated from the mean ) 

cr 


Whereas the sampling distribution of /S 2 is quite skewed for 
A < 200 and the accuracy of the tabled probability levels for 


Note on the Sampling Distribution of V/h, where the Population Is 
Normal, Biometrika, Vol. 27 (1935), pp. 269-271 

2 See p. 480. 

3 Pearson, E. S., “A Further Development of Tests for Normality,” 
Biometrika, Vol. 22 (1930-1931), pp. 239-249. 

4 See p. 480. 

6 Geary, R. C., and E. S. Pearson, Tests of Normality, pp. 13-15. 
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B 2 cannot be determined without further investigation the 
approximation involved in the calculation of probability levels 
for a is very close, even when N is as small as 10. For this 
reason the use of a in place of 0* to test existence of kurtosis is 
recommended. Probability levels for a have been worked out 
and arc given in Table XII of the Appendix. The use of this 
table will be illustrated in the next chapter. 

SUMMARY 

The fundamental basis of any sampling analysis consists 
in a comparison of a sample with the set of all possible samples 
that may be derived from some hypothetical population. 1 he 
basis of the comparison will vary from problem to problem. The 
present chapter, which is concerned with sampling from a normal 
population, discussed four principal sample measurements that 
might be used to make this comparison. These are the mean ol 
the sample, the variance and standard deviation of the sample, 

.\/A' (X - X) 

and the sample statistic — ^ 

When all possible samples from a, normal population are 
considered with reference to their mean values, it is found that 
the set of sample means forms a normal frequency distribution 
the mean of which is the mean of the population and the variance 
of which is the variance of the population divided by N. this 
distribution of the sample means is called the “sampling distri¬ 
bution of the mean.” It is to be noted that its shape and position 

depend on the values of the population mean and variance. 

When the set of all possible samples is described in terms 
of the variances of the samples, it is found that the set of sample 
variances make up a skewed frequency distribution, which 11 
6 2 /N is taken as the unit of measurement, is of the form ol 1bhe 
X 2 distribution, with n in the x 2 formula equal to N - 1. T is 
is the “sampling distribution of the variance.” The shape 
and position of this distribution depends only on the value of 
the population variance and is independent of the value of the 
population mean. If the sample is large, say 30 or more, the 

1 GEtRT R P 'c.', “Moments of the Ratio of the Mean Deviation to the 
Standard Deviation for Normal Samples,” Biometrika, Vol. 28 (1936), pp. 
295-307. 
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sampling distribution of the variance is almost normal in form 
e mean of the distribution being approximately equal to the 
variance of the population and the variance of the distribution 
being equal to the variance of the population times 2/N. 

When the set of all possible samples is described in terms of 

the sample values of the statistic it is found 

that these sample values make up a frequency distribution which 
is of the form of the t distribution, with n equal to N — 1. This 

is the “sampling distribution” of the statistic ^ ~ *) . 

Since the form and shape of the t distribution depend only 
on the value of n (here equal to N - 1), it follows that the 

sampling distribution of , g independent of ^ 


V 

values of the population mean and variance. For large samples 
say 30 or more, the t distribution is almost normal, with a mean 
of zero and a standard deviation of approximately unity. ' 
Attention was also called to the sampling distributions of such 
statics ^as the median and the range, ft and ft, gi and g 2 , 

appendix 

It is the purpose of this appendix to derive the sampling 

distributions of the mean, variance, and 'V / A r ‘ ( X — X) ^ ^ 

sample of 2 cases and a sample of N cases from a normal popula¬ 
tion The argument will be the same as in the text but will 
be algebraic and hence general, instead of arithmetical and 
particular. 

SPECIAL CASE N = 2 

Distribution of All Possible Samples of Two from a Normal 
Population Derivation. Let the first case in a sample be repre¬ 
sented by X t and the second by X 2 . Since the population is 
normal, the probability of Xx lying in the interval Xx to Xx + d.X, 

IS 


dPiX,) 


<J v / 27r 


exp 


- T 


(x t - xy 

2d 2 


dXi 


( 1 ) 
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and the probability of X 2 lying in the inter val X 2 to X 2 + dX t is 


dP(X 2 ) 


exp 




(X. - X) 


dX, 


( 2 ) 


If samples of two are drawn at random from the population, 
the law of large numbers suggests that the relative frequency 
with which samples will have one case in the interval Xi to 
Xi + dXi and the other in the interval X 2 to X 2 + dX 2 may ■o 
predicted by the joint probability of Xi and X 2 , that is, by the 
product of Eqs. (1) and (2); thus 


rfP(Xi,X 2 ) 


6 2 2tt 


exp 


(X! - X) 2 + (X* -X) ; 


2d 2 


<fXidX 2 (3) 


Equation (3) is the mathematical formula for the distribution of 
all possible samples of two from a normal population. It is the 
model that any large number of actual random samples of two 

will tend to approximate. , 

Geometrical Properties. If values of Xi and X 2 for anj^ sample 
are taken as the coordinates of a point m a plane, then the set o 
all possible samples will be represented by a cluster of points m 
this plane; and their distribution over the plane will be given by 
equation (3). The factor dXxdX* of this equation will represent 
the area of the plane containing the specified sample values and 


d 2 27T 


exp 


(X: 


X) 2 + (X 2 - X) 2 

2 d 2 


will represent the “density” of sample points in this area Since 
the density factor increases as Xi and X 2 approach X, it follows 
that the greatest concentration of sample points is m the 
immediate neighborhood of the point X,X. Furthermore, since 

(Xi - X) 2 + (X 2 - X) 2 

measures the square of the distance from_X,_X it follows that all 
sections of the plane equally distant from X,X will have the same 
density of points and that the density decreases as the distance 
from X,X increases. The distribution of all samples of two thus 
has a circular symmetry about the central point, X,X. 

The point X,X, it will be noted, lies on the line through the 
origin that bisects the angle between the axes. This line (line CD 
of Fig. 04 or Figs. 65a and 65b, if projected to pass through the 
point where Xj = 0 and X 2 = 0) is the locus of all samples that 
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have two equal values and thus has the equation X x = X 2 or 
A1 — A 2 = 0. 

^ Now the mean of any sample (N = 2) is (X x + X 2 )/2 = X. 

ut this is the equation of a line that cuts the X x axis at 2X and 
the X 2 axis a_t 2X, and is perpendicular to the line CD, cutting 
that Ime at X,X. Such a line is line AB of Pig. 64. All points 
on line AB have the same mean. It then follows that the mean 
of any sample can be found by finding where the line through it 
perpendicular to CD, cuts CD. Hence variation in mean values 
from sample to sample can be represented by variation along CD 
Also, since the variance of a sample X x X 2 is by definition 


<r 2 = [(Xi - X) 2 + (X 2 - X) 2 ]/2 


and since (X x X) 2 + (X 2 - X) 2 represents the square of the 
distance from the sample point X x ,X 2 to the mean point X,X it 
follows that all sample points equidistant from their mean points 
have the same variances. But all mean points X,X lie on the 
line CD; hence all points equidistant from line CD have the same 
variance.* Loci of equa! variances are accordingly fines, like lines 
EF and E F of Pig. 64, that are parallel to fine CD. Variation 
in sample variances is thus measured by variation perpendicular 
to these lines. 


Distribution of All Possible Samples in Terms of Their Means 
and Variances. From the foregoing analysis, it follows that the 
plane of Fig. 64 can be divided into cells indicating variation in 
A and m <r- or <r, m lieu of cells indicating variation in X x and X, 
This might be done as follows: 

Select any dX 1 interval on the X x axis and draw perpendicular 
fines through each end of this interval; likewise select any dX, 
interval on the X 2 axis (equal to dXJ and draw fines through 
the end points perpendicular to the X 2 axis. Let these two pairs 
of parallel lines mark out the cell abed of Fig. 70. The top and 
bottom of this cell will be dX 2 and the sides dX x . This cell is 
like cell abed in Fig. 63 on page 222. Next draw fines through 
o and c perpendicularto CD. These will mark off an interval on 
CD that will equal dX \/2; since X = (Xi -(- X 2 )/2, 


— (dX i dX 2 )/2, 

and because dX x = dX 2 , dX = dX, or dX 2 . Hence, in Fig. 70, 
1 See p. 247. 
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fg = ab^s/2 = dX 2 y/2 * dX \/2. Finally, draw lines through 
b and d parallel to CD and call the perpendicular distance between 
them d (7 V2; the variance, it will be recalled from page 227, 
equals one-half the square of the distance from CD. 

From the cell abed there has thus been derived the new cell 
efgh of size dX \/§ da \/2 = 2dXda. When the whole of the 


D 



Fig. 70. 


plane has been cut up into cells like efgh , the plane will become a 
grid based on X and a intervals instead of Xi and X 2 intervals. 

In Eq. (3) the factor that measures the variation in density of 
sample points from one part of the plane to the other is the 

quantity ^ exp [-But the 
numerator of the exponential can be evaluated as follows; 1 


1 For N<r% — 2d 2 NC 2 , in which d = Xi — X and C = X — X. This 
is merely a version of the “short formula” for finding the standard deviation. 
In this application, it is to be remembered, N = 2. 


% 
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(Xi - x ) 2 + (x 2 - x ) 2 = (Xx - x ) 2 + (X 2 - xy + 2(x - x ) 2 

= 2(j 2 + 2(X - X) 2 

This indicates that it is possible also to measure the variation in 
density of sample points in terms of X and < 7 . 

This now permits the complete expression of the distribution of 
all samples in terms of X and a. For from the foregoing it 
follows that the ^probability of a sample having a mean lying 
between X and X + dX and a a lying between a and a -f da is 
equal to the density factor for the areas containing the specified 
values of X and <7 times the size of the area. The density factor 


is 


d 2 V27T 


= exp 


2cr 2 + 2(X - X) 2 

2d 2 ' 


The area factor is double 


the area of square efgh in Fig. 70; i.e., double 2 dXd<r or UXda. 
The doubling of the area of efgh results from the fact that there 
is another square, e’f'g'h', of the same size on the opposite side of 
line CD that, also contains samples having the same mean and 
standard deviation as samples in square efgh. Since probability 
equals density factor_times area, it follows that the probability of 
a sample having an X between X and X + dX and a <7 between <r, 
and <j -f” d<j is equal to 


dP(X,«) = ^ p 


1 


< J /\/2 6XP 

or, since 2<rda = da 2 


_ 2 a 2 + 2(X - xy 

~ "2d 2 

_ 2(X - X) ; 


2d 2 


dX 


UXda 

2 V2 


d \/2tc 


exp 


2cr 2 

2d 2 


da 

( 4 ) 


dP(X,a) - 


\/2tt a/\/2 


exp 


2(X - X) 2 ' 

2d 2 


dX 


V2 


exp 


crd \/27r 
2d 2 


2 d‘ 


da 2 (40 


Equation (4') is the equation for the distribution of all sample 
points in terms of their means and variances. 

Distribution of All Sample Means . Equation (4) gives the 
probability of a sample falling in either square efgh or square 
e'f'g'h' of Fig. 70. It is the_ probability of a sample having a 
mean between X and X + dX and a standard deviation between 
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a and <r + da. To find the probability of a sample having a mean 
between X and X + dX and a standard deviation of any value, 
it is merely necessary, by the addition theorem, to sum probabili¬ 
ties like Eq. (4) for all values of the standard deviation. This 
can be done with reference to Fig. 64 by summing up the prob¬ 
abilities of all cells lying between AB and A'B'. Algebraically 
this sum is found by integrating Eq. (4) with respect to <r. Thus 


V2tt <T/V2 


exp 


2(X - X)*l f “ 2 V 2 

“J Jo d y/tor 


d<r 


But the integral is of the form 


T- L 

Jo y/2i 


e~‘ V2 de 


where z = and ec l ual to sincc it; is double one ' half 

the area under the standard normal curve. Hence 


dP(X) 


\/ 2 tt <$/\/2 
1 


exp 


2(1 - X) 2 


2 d 2 


dX 


d, \Z2 tt 


exp 


(1 - X) 2 


2d, 2 


dX 


(5) 


in which d t = d/ \/2. 

This shows that, for a normal population, the distribution of 
means of samples of two is normal in form with a mean equal to 
the mean of the population and a variance equal to the variance 
of the population divided by two. 

Distribution of All Sample Variances , The same process may 
be used to find the distribution of all sample variances. In this 
instance, the probabilities of all cells lying between lines GII and 
EF and lines G*W and E'F f arc summed. Algebraically the 
distribution is found by integrating (4') with reference to X. 
Thus 


a/ 2 f 2o- 2 l T > f" 1 

ap{ ’ ,) - r m * J - v&i 


(X - ty 


2 d* 2 


dX 


exp 




dP(<x i ) = —?_ e xp f" - 

vd y/v L 
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But the integral gives the total area under the normal curve ana 
thus equals 1. Hence 

2 °- 2 l j , 

2d*J d * (6) 

If x 2 is set equal to 2a- 2 /<j-, and it is noted that \/t = r_ 'it 

this becomes s ; v 

dPW)= ^n dx , 

which shows that for samples of two from a normal population 
the sampling distribution of 2<^/d 2 has the form of a x 2 distribu- 
tion with n = 2 — 1 = l. 1 

Distribution of All Sample Values of t = V 2 (X - X) To 

a 

find the distribution of sample values of t = V / ^ (X ~ X) g 

a 7 

note that by definition « = <r or when N ^ 2,t = a y/2. 

Hence for N - 2, t = = (X - X)* and adt = dX. 


Substituting these values in (4/) gives 


dPM = — 

V 27 r 6/\/2 

Combining terms yields 


exp 


d 2 _ 


rlf. ^2 

at -= exp 

ad \Z2 tt 


-1 

d 2 J 


da 2 


dP(t,<r) = ^ • 1 exp 


which may be put in the form 


[-^(1 +^) l * r 2 


dt 


dP(La) = —_exD 

^ ' 7 r(l + **) GX P 


~~ ^2 (! + ^ 2 ) 


d 


51 (1 + * 2 ) 


To get the distribution of f, integrate this for all values of V- 
from 0 to oo . Thus, 

dP{t) = ^rr p ) L e ~“ dy> where y = £ (1 + ^ 

1 Cf. equation for the x 2 curve, page 111, 
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But the integral equals which equals 1. Hence 

dPif) == tt(1 + f ) 

which is seen to be of the form of the t distribution with 
« = — 1=2 — 1 — 1. 

Hence ~ i s distributed like t with n = l. 1 

<7 

THE GENERAL CASE N > 2 

Before proceeding to the general case, it will be well to prepare 
the ground by reviewing several mathematical functions and 
relationships. 

N-dimensional Geometry. Those who have had coordinate 
geometry will remember that a linear equation in two variables 
may be represented by a straight line drawn on the coordinate 
plane. If the equation is put in the form aXi + bX 2 + c = 0, 
~c/b is the X 2 intercept, -c/a is the Xi intercept, and -a/b is 
the slope of the line with the Xi-axis. The “direction ratios” 
of the line, i.e., factors that are proportional to the cosines of the 
angles the line makes with the Xi- and X 2 -axis, are —a and b. 
Thus a line that is perpendicular to aX i + bX 2 + c = 0 would be 
one whose slope equals T -b/ci and whose direction ratios are ct 
and b. In particular, a line through the origin that made equal 
angles with the axes would be Xi — X 2 — 0, and a line perpen¬ 
dicular to this line would be Xi + X 2 = c. Similarly, in three 
dimensions, a linear equation of the form 

aX i + bX 2 + dXz T c — 0 

represents a plane, and a, b, d are the direction ratios of a line 
perpendicular to the plane. If this is generalized, it may be said 
that a linear equation of the form 

aXi + bX 2 + • * • + hX N + c - 0 

represents a hyperplane, or flat space, in N dimensions and 
a, b, are the direction ratios of a line perpendicular 

to this hyperplane. Just as a line in a two-dimensional plane 
is said to have one dimension (i.e., length) and a plane in three 

1 Cf. equation for the t distribution, p. 111. 
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dimensions is said to have two dimensions (i.e., length and 
breadth), a hyperplane in X dimensions is said to have N — 1 
dimensions. It is to be noted that this use of X-dimensional 
space is merely a convenient way of generalizing certain algebraic 
relationships that have concrete counterparts in two and three 
dimensions. There can be no real X-dimensional figures. 

This generalization of two- and three-dimensional relation¬ 
ships may be extended to other figures. Thus, in two dimen¬ 
sions, (Xi — a) 2 -f- (X 2 — b) 2 — r 2 is the equation for a circle 
with center at (a,b) and radius equal to r; in three dimensions, 
(Xi — a) 2 -f- (X 2 — b) 2 + (X "3 — c ) 2 = r 2 is a sphere with center 
at (a,b,c) and a radius equal to r. If these relationships are 
generalized, it may be said that 

(X! - a) 2 + (X 2 - b) 2 + • • • + (X N - k) 2 = r 2 

represents a hypersphere in X-dimensional space with center at 
(a,b, . . . k) and radius equal to r. The circumference of a 
circle is of one dimension (being only a line), the surface of a 
sphere is of two dimensions, and the surface of a hypersphere 
in N dimensions will be of N — 1 dimensions. These X-dimen¬ 
sional notions will be useful in the analysis that follows. 

The Gamma Function. The integral 

r (m) = J o “ e-fx™- 1 dx, m > 0 (7) 

is known as the “gamma function.” Integration by parts shows 
that the integral on the right equals 

+ f* e-*(m - l)**”- 1 '- 1 dx (8) 

But the first term of Eq. ( 8 ) equals 0 when x = 0 and also when 
g == co and the second term is seen to be equal to (m — l)T(m — 1). 
Hence we have the relationship 

T(m) = (m - l)r(m - 1) (9) 

By Eq. (9), T(m — 1) = (m — 2)T(m — 2), etc.; therefore, if m 
is a positive integer , 1 

r(m) = (m - 1 )(m - 2 )(m - 3) • • • 1 = (m - 1)! (10) 

1 The last term will be r(l) = e~ x dx = [ — e~ x ](l = e° = 1. 
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Equation (10) holds for positive integral values of m. For 
fractional values of m, it is taken as the definition of ml Thus, 
in general, ml is defined by 

ml - V(m + 1 ) (100 

In statistical theory, r(i) is of particulai interest; for if m is 
set equal to -j. in Eq. (7), it becomes 

■ r($) = J 0 “ e~ x x~i dx 

By setting x = t/ 2 /2/c 2 , this may be put in the form 



If the right member is multiplied by -v^W \/27r, this becomes 


rft) = 2 V 


f oo 1 3/ 2 

’L 


But the integral is recognized as the sum of half the area of 
a normal frequency curve and is thus equal to Hence, 
Eq. (11) reduces to 1 

T(i) = Vir C 12 ) 

In conclusion, it may be noted that 


is known as the “incomplete gamma function ( incomplete 
because the integral is 0 to * instead of 0 to «=).’ Tables th “ 
function have been computed by Karl Pearson and his staff an 
have been published by the Cambridge University Press. 

Distribution of All Possible Samples. Derivation. The first 
step in deriving the sampling distribution of means, variances, 


The integral 


dy can be evaluated without, making use of knowl¬ 


edge of the normal curve. See, for example, John F. Kenney, Mathematics 
of Statistics (1939), Part U, pp. 35-37. 
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etc., is to obtain the derivation of all possible samples of N 
from a normal population. This may be accomplished by direct 
application of the multiplication theorem. 

The probability of a case lying between X x and Xi + dXi is 


dP(Xt) = 


d \/2- 


exp 


'7T 


(V - xy 

2 d 2 


rfZ, 


(13) 


Similarly, the probability of a case lying between X 2 and X 2 + dX 2 
is 


dP(X 2 ) 


7^ eXP 


~ -(X 2 - X) 2 ~ 
2 d 2 “ 


dX 2 


(13') 


and the same sort of equation holds for the probability of X 3 , 
X 4 , . . . , X N . 

If samples of X are drawn at random from an infinite normal 
population, the law of large numbers suggests that the relative 
frequency, or probability, of a sample in which one case lies 
between Xi and Xi + dX h another between X 2 andX 2 + dX 2 , a 
third between X 3 and X 3 + dX 3 , etc., will be predicted by the 
joint probability of X h X 2 , X 3 , . . . , X N , that is, by 


dP(X 1 , X 2 , 
1 


W=WTr eX P 


(d \/2t) n 


X N ) = 

-2(X< 


X) : 


2 d 2 


(dX{)(dX 2 ) 


{dX N ) (14) 


Equation (14) is the equation for the distribution of all possible 
samples of N from a normal population and is the model that 
any large number of samples of N will tend to approximate. 

Geometrical Representation and Properties. If the sample 
values of Xi,X 2 , . . . , X* are represented by a point in X-dimen- 
sional space with coordinates X h X 2 , . . . , X N , then Eq. (14) 
represents a cluster of sample points with center at X, X, X, 

X (see Fig. 71a). 

Geometrically Eq. (14) says that in any X dimensional cell 
whose sides extend from X x to Xi + dX h X 2 to X 2 + dX 2) 

• • • Xn to X N + dX N the density of the sample is given by 


1 

(d \/2tt) n 



X(Xi - X) 2 ' 
2 d 2 


and the probability of a sample falling in this cell is equal to the 
product of this density factor times the volume of the cell, viz., 
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dXi, d,Xz, dXz . . . dXs. From this interpretation it follows 
that the density of sample points is the greatest when Xi, Xi, 
Xs all equal X, that is, when Xi, X 2 , . . . , X N all lie at the 
center of the cluster. Since S(X ; - X) 2 equals the square of the 
distance of a sample point from the center point X, X, , X, 

Eq. (14) also shows that the density of sample points is constant 
for all cells lying on a hypersphere with center at X, X, ... , 

and radius equal to V2(X; — X) 2 . 



Xl + X 2 + X 3 + • • • +Xn = y- 

N 


For a given value of X this defines a hyperplane that is perpen¬ 
dicular to the line OM since its direction ratios are all equal. 
In other words, all samples that have the same mean lie on a plane 
that is perpendicular to line OM and cut this line in the samp e 
point X, X, . . . , X. It follows that variation in the mean 01 a 
sample can’be represented by variation along the line OM. 
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X(X- — Jo 2 

The variance of a sample is <r 2 = --Thus Na 2 

represents the square of the distance of a sample point from the 
point lying on OM whose coordinates are all equal to the mean 
of the sample. All samples with the same variance thus lie on 
a hypercylinder whose axis is the line OM. It follows that 
variation in the variance of a sample can be represented by varia¬ 
tion in the square of the perpendicular distance of a sample point 
from the line OM. 

Distribution of All Possible Samples in Terms of Their Means 
and Variances. It is ^suggested by the foregoing analysis that 
the distribution of alt*possible samples may be given in terms 
of the means and variances of the safSiples provided that the 
proper element of volume is found. As indicated above, the 
probability of a sample is the same for all “cubical cells” lying 
on the surface of a hypersphere, the probability being propor- 

tional to exp [ ~ Z( ^ 2 ~ If “ X ) 2 is set equal. to 

S 2 , this means that the density of sample points is constant 
throughout a shell with center at X, X, . . . , X, radius equal to 
8, and thickness equal to dS. But 1 

S(Xi - X) 2 - X(Xi - X) 2 + N(X - X) 2 

= N<t 2 + N(X - X) 2 

Hence the probability: of samples for all cells lying in a given 
shell is proportional to 



jv<t 2 1 

" —N(X - X) 2 “ 

exp 

2dV| eXP 

2d 2 J 


that is, to the product of a function depending on the variance of 
a sample by a function depending on the mean of the sample. If 
the shell in question is cut by a hyperplane through the point 
X, X, . . . , X on OM and perpendicular to OM and another 
through the point X + dX, X + dX, . . . , X + dX and also 
perpendicular to OM, these planes will cut off a shell of one less 

1 This is merely a “short” equation for obtaining the sum of the squares 
of the deviations from the mean of a sample by finding the sum of the squares 
of the deviations from an arbitrary origin (here the mean of the population) 
and the difference between the sample mean and this origin. 
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dimension 1 throughout which the density of sample points con¬ 
tinues to be proportional to 




Within an error dependent upon infinitesimals of higher order, 
this second shell may be approximated by a u rectangular” shell 

whose inner radius is V / S(X^ - X) 2 = y/N a, whose outer 
radius is y/N (a + da), and whose width is dX (see Fig. 74 A, 



Fig. 73 . 


B, C). 2 The proportionate frequency of points in this second 
shell will be approximated by the product of the density factor 

f -Na 2 ~] [-N(X-X) 2 ~] 

exp L“2^J exp L— 

1 In three dimensions, the first shell is a true shell, the second is a “ring” 
with width approximately equal to dX and thickness equal to da (see Fig. 
744., B, C ). 

2 The argument is thought out in three dimensions but is generalized to 
N dimensions. The diagrams illustrate the three-dimensional argument. 
Figure 744 shows half the original shell. Figure 745 shows the rectangular 
approximation. Figure 74C shows that the areas of the cross sections are 
practically the same, although of different shapes. 
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times the volume of the shell, or some factor proportional to 
this volume. This relative frequency will represent the relative, 
frequency, or probability, of all points whose means lie between 
X and XdX and whose standard deviations lie between a 
and a + da. It will thus give an expression for the distribution 
of all possible samples in terms of their means and standard 

deviations. . . , , 

Since the radius of the shell in question is proportional to„<r, 

its volume will be proportional to (<r)^ da dX. For a hyper¬ 
plane of N dimensions will intersect an IV-dimensional hypersphere 
in another hypersphere of N - 1 dimensions (e.g a plane will 
intersect a sphere in a circle), and the generalized surface area 




of the latter will be proportional to the radius raised to the 
(N — 2)th power. For example, the surface of a three-dimen¬ 
sional sphere is proportional to r* and the circumference of a 
circle to r. The total volume will equal this surface area multi¬ 
plied by the width of the shell dX and its thickness da. Multi¬ 
plying the volume of the shell by the density factor thus gives 
the relative frequency of the samples lying in the shell as pro¬ 
portional to 

r -Na 2 i r-M(x-xri 

^__^j exP L—J 


q.N-2 eX p 


da dX 


But da 2 = 2a da or da = therefore, the above may be 


2a 


written 


V-3 


5 2 


exp 




A'f.y - X) 2 1 ih . d % 


2d 2 
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Thus it is to be concluded that the relative frequency of a sample 
having a mean lying between X and X + dX and a variance 
lying between a 2 and a 2 + da 2 is 


«*<'■ V«p [^] 4. 05) 

where K is some constant independent of a 2 and jf 
Distribution of All Sample Means. If „« is given a particular 

value, Hq. (15) gives the distribution of sample means; i.e. it 
gives the relative frequencies of samples with different mean 
values, but all the same <r 2 . It shows that these relative fre¬ 
quencies are proportional to 


exp 


-(X - X) 2 ] 
2 6 2 /N 


dX 


which is obviously the form of a normal frequent distribution 
with mean at X and variance equal to &/N. It also shows 
that the form of the distribution is the same nd: matter what 
the value of *3 chosen. In other words, the distribution of 
sample means is independent of the value of the sample variance. 
Hence, if the relative frequencies are summed for all values of 

f 2 , the relatlv e frequencies of various mean valued will be given 
m general by 


To obtain K h integrate the expression above from - oo to + oo 
and set it equal to 1, the total frequency. Thus 


or setting y = 


/-„ Ziexp [ 

X — X . , r ] v 

Wvw and notmg that dy = ^ 


-(X - X) 2 “ 
2dYX 


dX - 1 
dX 


\ /%T 

~vr K 




V2i 


dy = 1 


But the integral is obviously the 


area under a norihal curve of 


unit variance and hence is 1. Thus K x == 


V 2 t r d/'y/N 


Hence 
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1 . (14) 


dP{S) ~ V^d/V^ 


r-(X-X)5| t 
^ exp — 2 ^ 72 T J 


wVtip-h is the distribution of all sample means *' 

Ditribution of All Sample Variances. If X is given a particu- 
. i T?n (\*>) eives the distribution of sample variances, 

" ; ,S« « samples with d*-*^*** 

tta tame iv It show, that these "M™ «• 

proportional to ^ ^ 

(tr 2 )^ - exp 

and thus indicates that the distribution of sample ’“ 
independent of the value of the sample mean. Hence, if he 
relive frequencies are summed for all sample mean values, the 
relative frequencies of various variances will be given m gene 

by 


T-iVcr 2 ] 

L" 2 ^ J 


dv 2 


V —3 


ip(* 2 ) « 2 exp 


L 


dc 2 


To obtain K>, integrate this expression from 0 to + - and set it 
equal to 1, thfe total frequency. Thus 

N -3 

Kt{ <r 2 ) 2 


y-i w—i 

(d 2 ) 2 2 

" IE 

N 2 


K 2 (<t 2 ) 2 exp do 2 - 1 

But if No*/j is set equal to x 2 , this may be put in the form 

the integral equals V ^ ^ ^’ therefore, 

K2 = 

and 


But 


v- i 
N 2 


N -1 

2 2 r 




dP(<r 2 ) 


N -1 V- 1 

N 2 (<r 2 ) 2 


■ y-_i 

2 2 r 




(<**) 2 


H exp - 

9 *- 


— N<r 2 
'2d 2 


dd 2 (17) 
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If Mr*/* is set equal to x 2 , this takes the form of a x 2 distrihu 
tion with n = N - 1 Since Ar 2 _ o j ,, , * aistrlt >u- 

i s ' omc -2<rd<T, the distribution of 

JV-1 


dP(a) = (<r) 

„XEF7WTa -v=r ex P 

2 2 r 


/JV - i\ , iti 

(-H (-■) 2 


-iVcr 2 ! 

"2e(2~J rf<r ( 18 ) 


Distribution of Sample Values of (X ~ gince ^ 

distribution of sample variances is independent of the distribu 
tion of sample means, and vice versa, their joint distribution t 

EqT n°6i the rfOrrf °' th6ir individuaI distributions, shown by 
Eqs (16) and (17). The exact form for Eq. (15) is thus given 
by the product of Eqs. (16) and (17), or 

dP(Xy) = _ =pxn f-(X - X) 2 

V2tt d/VX P [ 2 dyff 


dX 


N-l N-S j 

N 2 ( ff2 ) 2 " I -jV<t 2 "1 


N 

2 2 


N JV — 3 

N 2 (a 2 ) 2 


da : 


'^(¥) 


exp 


N 

(d 2 ) 2 

-Xr 2 _ (X - X) 2 

2d 2 ~W/N~ J 


The di 


<WX (150 

distribution of yXgxz*) ^ ^ ^ ^ 

(150 as follows: First set the svmM t - vW (X - X) 

J #~~~— an( f n °fe 

that 9 = <r Therefore, t = (X - X) 


i while 


a 


aH 2 


Substituting for (X - X) 2 and __ 

a/AT3T 

Ec h (150 yields 


dt for dX in 
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dP{t,<r 2 ) = 


N N-2 

N 2 (<7 2 ) 


r No*__N__*v\ 

N — 1 2<S 2 J 


which may be put in the form 
dP(t,<r 2 ) = 


exp 


n y-2 
N 2 (a 2 ) 2 


da 2 dt 


2 2 V*" r ^ 2 "“) - 

-S(' wv --i)]* 2 * 

Further manipulation shows that the following is the equivalent 
of the foregoing: 

jpm - (‘ + jrh)]” 1 " " p [" Y (‘ + *^0] 

" [S (‘ + 5^)1 ^^^7(4 I )0+v?t) ! 

. i\Na 2 (.. , t 2 \ \ Tf the 

Here t is taken as a constant m d y ^ i- N _ J 

foregoing is summed or integrated for d* for a given value of t, 

the probability of that t is given by 

dt 

dP(t) = 


Vn - 1V* r (^2^) ( l + w=i) 2 

/;is( ‘* Ar " i isyi)i 

But the integral is of the form jj tr-X^dX, which equals 
p Hence, the distribution of t is 
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dP(t ) =- 

v * 7 r 

or if n is set equal to N — 1, 



rtei)* 

dP(t) = -_V 2 / 

v=r(j)( 1 + 0»f‘ 

a ® the form#of the 1 distribution. Hence 
~z is distributed like t with n = N — 1. 





CHAPTER XI 

SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 
II. USES OF THE SAMPLING DISTRIBUTIONS 

SAMPLING DISTRIBUTION OF THE MEAN 
Used to Test a Hypothesis. Problem When Population Vari¬ 
ance Is Known. The practical use of the sampling distributions 

discussed in the preceding chapter may be explained by reference 
to several examples. Consider first an illustration of how the 
sampling distribution of the mean is used to test a hypothesis. 
Suppose it is claimed that a new process of manufacturing 
electric light bulbs will increase their mean length of life to 
1 100 kilowatt-hours. A manufacturer tries out this new process 
and produces a sample batch of 100 bulbs for which the mean 
length of life proves to be 1,090. The new process, it may be 
supposed, does not change the variability m length of life from 
bulb to bulb, and therefore the standard deviation in length of 
life may be taken as that of the old process. Suppose this is 
known to be 200 kilowatt-hours. The question to which the 
manufacturer seeks an answer is therefore this: Is it reasonable 
to infer that the given sample is from a population m which the 
mean is 1,100 kilowatt-hours? This is the question that will 
now be discussed in some detail. 

' Coefficient of Risk. Before answering the question the manu¬ 
facturer must first determine what degree of risk he is willing 
to undergo in rejecting the hypothesis when it is true. Suppose 
he is willing to make this mistake once in twenty times on the 
average; his coefficient of risk will then be .05. 

Region of Rejection . The next step is to select a “region of 
rejection,” which on the basis of the given hypothesis will mark 
off the unreasonable or unacceptable samples from the others, 
the probability of this subset of samples being just .05. Since 
the sampling distribution of the mean is normal m form, the 
argument is essentially the same as that pertaining to percent¬ 
ages (see Chap. IX). 


267 
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If the manufacturer is indifferent as to what moan value 
other than the hypothetical mean value is the true one, he will 
do best to distribute his .05 region of rejection equally at both 
ends of the sampling distribution. If he wishes to reject the 
hypothesis more often when the true mean value is actually 
be ow the hypothetical mean value than when it is above this 
va ue, he will do best to put the .05 region entirely at the lower 
end of the distribution. If he wishes to reject the hypothesis 
more often when the true mean value is actually above the 
hypothetical mean value than when it is below this value he 
will do best to put his .05 region entirely at the upper end of the 
sampling distribution. 

In the present instance, a region of rejection lying entirely 
at the lower end of the distribution would appear to be the best 
region to adopt; for the manufacturer would not care if the 
claims of the new process were more than substantiated He 
would care only if they fell short of being true. 

Testing the Hypothesis. Since .05 of the area of a normal 
frequency curve hes below 1.645d from the mean and since the 
standard deviation of the sampling distribution of the mean is 
eqiml to the standard deviation of the population divided by 

Yfu’ regi0U ° f reieotion Wng entirely at the lower end 

of the d^tnbution would consist of all samples whose means lay 
below 1,100 (the hypothetical population mean) minus 


(1.645) 


200 


Vioo' 


that is, below 1,100 — 32.9 = 1067.1. 

A graph of the sampling distribution of the mean and this 
region of rejection is shown in Fig. 75. Since the actual sample 
value was 1,090, it does not fall in this region of rejection and 
the hypothesis is not rejected. Even though the sample failed 
to have a mean as high as that claimed for the new process the 
manufacturer will not reject the claim that the new process’will 
make bulbs averaging 1,100 kilowatt-hours of life. That is the 
amount by which the sample average fell short of the figure 

c aimed was not large enough to warrant the rejection of the 
claim. 

Other Examples. Other cases might be conceived of in which 
an upper region of rejection or an evenly distributed region 
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of rejection would be the proper region to employ. For example, 
if a process of cigarette manufacture is claimed to yield cigarettes 
of a low burning temperature, an upper region of rejection would 
be appropriate. Or a manufacturer of automobile tires may 
wish do copy the tires of a competitor as closely as possible, and 
in this case he might be equally eager to avoid turning out tires 
that were markedly worse or markedly better than the competing 
brand; here an equally distributed region of rejection would be 
in order. 



Confidence Limits for the Population Mean. Problem When 
the Population Variance Is Known. In many cases it is the aim 
of the statistical investigation to determine confidence limits 
for the mean of the population from which a sample has been 
drawn rather than to test any particular hypothesis. Again 
this may be done in a manner that is very similar to the process 
of determining confidence limits for percentage figures. The 
following discussion will relate to the electric-light bulb example 
described in the preceding section. As before, it will be assumed 
that the population variance is known. 

The Confidence CoefiicienL Confidence limits, it will be 
recalled, are the end points of a certain range of values that may 
be said to include the true value with a given degree of probabil¬ 
ity. This degree of probability is called the “ confidence coeffi¬ 
cient ” associated with the given “confidence interval.” Suppose 
in the present instance that the manufacturer of electric-light 
bulbs adopts a confidence coefficient of .95; that is, he wishes 
to be able to say that there is a probability of .95 that the 
confidence interval he establishes covers the true mean value. 

The Confidence Interval As previously pointed out, innumer¬ 
able confidence intervals may be set up that will all have a 
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confidence coefficient of .95. Suppose the manufacturer is as 
much concerned with underestimating the true mean value as 
with overestimating it. Then he will proceed as follows: 

He will select a lower value such that if it is the true mean 
value the probability of getting the given sample mean or some 
higher value will be just .025. The process is illustrated in 
Fig. 76A. Next he will select some upper value such that if it 
is the true value the probability of getting the given sample 



C=Showing whole 
confidence interval 

Fig. 76 . 


mean or some lower value will be just .025. . This is illustrated 
in Fig. 76 B. 

The two values so determined will mark the upper and lower 
confidence limits for the population mean, as illustrated graph¬ 
ically in Fig. 76C. The distance between either limit and the 
sample mean value will be equal to 1.96 times the standard 
deviation of the sampling distribution of the mean,, that is, 
1.96 times the “ standard error ” of the mean, as it is called. 
For the sampling distribution of the mean is normal, and the 
probability of a sample value exceeding the true mean value by 
1.96d is just .025; and the same is true of a sample value falling 
short of the true mean value by 1.96d. Since the standard 
error of the mean is equal to the standard deviation of the 
population, divided by the square root of N, the confidence 




SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 271 
limits for the population mean may be found in this instance 
by simply laying off ±1.96^ frdm the sample mean value. 
For the given problem, these limits would be equal to 


1,090 ± 


20 0 

V’loo 


or 1,129.2 and 1,050.8 kilowatt-hours. This interval is sym¬ 
metrical with respect to the sample mean value and does not 
tend to overestimate the true value more than it tends to under¬ 
estimate it. 



Fig. 77. 

Tf the manufacturer is concerned solely with the. possibility 
of overestimating the true mean value, he may desire only an 
upper confidence limit. The confidence interval will presumably 
run from zero to this upper bound. If the upper bound is 
selected as that value such that if it is the true mean value the 
probability of getting the sample mean value or a lower value is 
just .05, then this confidence interval may be said to. have a 
probability of .95 of covering the true mean value. Since the 
probability of a normally distributed variate falling short of 
its true mean value by 1.646$ is just .05, the upper limit of the 
confidence interval may be found by adding 1.645 times the 
standard error of the mean to the sample mean value. This; is 
illustrated in Fig. 77. For the given problem, this yields 

1? 090 + l.645 = 1,122.9. It may be said, then, that 
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there is a probability of .95 that the range 0-1,122.9 covers the 
true mean value. 

If the manufacturer wished not to underestimate the true 
mean value, he would reverse the foregoing process. He would 
subtract 1.645 times the standard error of the mean from the 
sample mean value to obtain a lower bound for the true mean 
value. This is illustrated in Fig. 78 and would be 

1,090 - 1.645-PL = 1,057.1. 

Vioo 

He could then say that there was a probability of .95 that the 
range 1057.1-co covered the true mean value. 



Relationship between Confidence Coefficient, Confidence Interval, 
and N. It is to be noted that the confidence limits set up in 
this way depend on the confidence coefficient adopted The 
smaller this coefficient, the closer the limits to the sample value 
and vice versa. 1 In other words, one can always narrow the 
range that is presumed to cover the true value if he is willing to 
increase the risk of its not doing so. Since the standard error 
of the mean is equal to the standard deviation of the population 
divided by the square root of N, it also is to be noted that, the 
larger the sample, the closer the confidence limits to the sample 
value. This is merely another way of saying that, the larger the 
size of the sample, the more confidence one can have that the 
true mean value is somewhere in the immediate neighborhood 

1 This Of course, is true only of the upper bound in Fig. 77 and the lower 

Knimrl in Li\^ *7O VA 
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of the sample mean value. 1 These points should be carefully 

noted. # 

Maximum -likelihood Estimate of the Population Mean. The 
Problem When Population Variance Is Known. In addition to 
setting up a range that might reasonably be presumed to include 
the population mean, the manufacturer of electric-light bulbs 
may wish to have a single estimate of the population mean that 
could be viewed as the “best” estimate to be made of this 
population parameter. As pointed out in the preview of theory, 
one v r ay of making such a single estimate is to take that value 
for the population mean that makes the probability of the sample 
mean a maximum. 2 An estimate so made is a maximum-like¬ 
lihood estimate. 

! Since the sampling distribution of the mean is normal, it 
follows that the probability of a sample mean is the greatest 
when the sample mean has the same value as the population 
mean. 3 If, therefore, in estimating the population mean from 
the sample mean, the former is taken equal to the latter, then the 
probability of the given sample mean is a maximum. 

In the problem under discussion the mean of the sample of 

electric bulbs was 1,090 kilowatt-hours. The maximum-likeli- 

- 

hood estimate of the population mean is therefore X =* 1,090 
kilowatt-hours. 

SAMPLING DISTRIBUTION OF ^ 


Used to Test a Hypothesis. The Problem When the Popula¬ 
tion Variance Is Unknown. The analysis of the preceding section 
was based on the assumption that the variance of the population 
is known. If the variance of the population is not known and 
the hypothesis to be tested does not prescribe any value for this 

parameter, the sampling distribution of the statistic ~ 

should be used. 4 

1 A more precise statement is this: The larger the value of N, the greater 
the probability that a given finite confidence interval, however small, will 
include the true value. 

2 Bee p. 180. 

3 For then P(X) would equal- ~- 7 ~’ which is the highest value it can 


_ -\/2ir 

have. 

4 This is called a “composite hypothesis 


for it supposes that the sample 
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Fpr purposes of illustration suppose that the electric-light 
bulb example is modified as follows: An inventor offers a new 
process of .manufacturing electric-light bulbs that will, he claims, 
increase the mean length of life to 1,100 kilowatt-hours. In the 
absence of any prolonged experience with the new process, the 
inventor makes no claims regarding the variability in length of 
life from bulb to bulb, and there is no reason a priori to believe 
that the variability of bulbs manufactured by the new process 
will be the same as that of bulbs manufactured by the 
old. The latter, it will be recalled, was the assumption of the 
preceding section. Furthermore, suppose the manufacturer to 
whom the process is offered is interested, not in the variability 
in length of life of the bulbs, but only in their mean length of 
life. To test the claim of the inventor he manufactures a batch 
of 10 bulbs by the new process and finds that the mean length 
of life of these 10 bulbs is 1,090 kilowatt-hours. 1 

In view of this result, is it reasonable to accept the hypothesis 
that the new process will in general produce bulbs whose mean 
length of life is 1,100 kilowatt-hours? Note that nothing is 
said here about the variance in length of life. To put the ques¬ 
tion another way, is it reasonable to assume that this sample 
could have come from a population in which the mean length 
of life was 1,100 kilowatt-hours and the variance was any value 
whatever? 


Since the hypothesis does not prescribe any particular value 
for the variance of the population and its value is not known, 
the proper statistic to use in testing the given hypothesis is 
y/N (X - X) , . , . 

—z — J ior in this statistic, only the hypothetical value 


of the population mean, X, enters. The quantity <f 2 , it will be 
recalled, is the maximum-likelihood estimate of the population 
variance that is made from the sample,* it is equal to the variance 


of the sample times ^ For the problem illustrated, let 
the value of the sample variance be (180) 2 kilowatt-hours. 2 


is not from some one specific population but from any one of a group of 
populations all of which have the same mean. 

1 A small sample is used here purposely (see p. 284). 

" Thls ? ^ ls recalled, is calculated from the? sample data bv the formula 
a i = S(X< - X)2 
N 
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Coefficient of Risk. In order to answer the question that 
has been posed, the manufacturer must adopt a definite coeffi¬ 
cient of risk that will measure the chance he is willing to take 


of rejecting the hypothesis when it is actually true. As before, 
suppose he adopts a coefficient of risk equal to .05. 

Testing the Hypothesis with a Symmetrical Region of Rejection. 
Next the manufacturer must adopt a definite region of rejection 
with a probability of .05. Such a region will be attained if 


from among all the possible sample values of ~ " ^ &s 


represented by the sampling distribution of this statistic, he 
chooses a special subset such that the probability of a sample 
falling in this subset is just .05. Since the various possible 



Fig. 79. 


sample values of this statistic are distributed in the form of a 
t distribution, with n - N — 1, the manufacturer will sec,ure 
a .05 region of rejection if he includes in the region all values of 

-- that are numerically greater than 2.262.* The 

(X 

value 2.262 is found by consulting a table of the t distribution, in 
which it is seen that for n = 9 (that is, 10 - 1) the probability 
of getting an absolute value of t that is equal to or greater than 
2.262 is just .05. As shown in Fig. 79, this is a symmetrical 
region of rejection because the t distribution is symmetrically 
distributed about its mean of zero. 

If the manufacturer adopts this symmetrical region of rejec¬ 
tion, he can test the given hypothesis as follows: This hypothesis 
is that £ = 1,100. The sample mean X is 1,090, N = 10, and 
the sample standard deviation is 180. The last gives 
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For these values the sample value of ~ ^ i s 

a 

Vio (1,090 - 1,100) _ 

189/7 ~ ,ib/ - 

This is greater than -2.262 and less than +2.262, and the 
sample value does not therefore fall in the region of rejection. 
This is illustrated by Fig. 79. The sample mean, although 
less than 1,100, is not sufficiently less than this quantity to cast 
doubt on its being the population mean. 

Other Regions of Rejection, The region of rejection adopted 
in the foregoing instance was a symmetrical region with respect 
to the t distribution. This would be the proper region to adopt 



Fig. 80. 


if the manufacturer were indifferent to whether the true mean 
value of the new process was above or below the claimed value. 
Since it is reasonable to suppose that he would be more con¬ 
cerned if the true mean value were less than 1,100 than if it were 
above this amount, a region that contains sample values all 
at the lower end of the distribution is more appropriate for 
testing the given hypothesis. Such a region would be given by 
those values of t that (for n = 9) are equal to or less than —1.833. 
For the t table shows that the probability of a t less than —1.833 
is just .05. * A picture of this appropriate region and the location 
of the sample value with reference to it is shown in Fig. 80. 

In another case a region of rejection lying all at the upper 
end of the distribution might be the more appropriate. It is 
suggested that the reader himself think of a case in which this 
might be true and work out an illustrative problem. 

* The table gives the probability of an absolute value of t equal to or 

SI Tooo ^ han l 1,833 i as ' 10 ; hence the probability of a t equal to or less than 
— 1.833 is .05. 



SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 277 

Case Where Population Variance Is Known Compared with Case 
Where Population Variance Is Not Known.. The difference 
between the case in which the standard deviation of the popula¬ 
tion is assumed to be known and the present case, in which it is 
not assumed to be known, may be made clear by several diagrams. 
These diagrams show how the various regions of rejection look m 



a D 



Fig. 81 b. 

terms of Fig. 64 (page 224). Consider first the symmetrical 
regions of rejection. In the case in which the standard deviation 
of the population is known, a hypothesis will be rejected if the 
sample falls in a region in which the mean of the sample deviates 
from the hypothetical population mean by more than a given 
amount, say 1.96d, for a .05 symmetrical region. In terms of 
Fig. 64 this means that all samples falling outside the lines L and 
U; shown in Fig. 81a, which is a miniature reproduction of Fig. 64 
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omitting the probabilities, will lead to the rejection of the 
hypothesis that the population mean is X. This is illustrated 
by the crosshatched portions of Fig. 81a. 

In the case in which the standard deviation of the population 
is not known the hypothesis is rejected if the sample falls in a 

region for which — ^ ^ ^ is numerically less than the 

.025 value of t. In terms of Fig. 64, this means that all samples 
lying between the lines a and ft Fig. 816, which is also a miniature 
reproduction of Fig. 64 omitting the probabilities, will lead to the 



rejection of the hypothesis that X is the mean of the population. 
This is illustrated by the crosshatched portion of Fig. 816. 

Although the regions are symmetrical in both instances, it is 
to be noted that they do not include the same set of samples. 
Some samples are common to each, but there are other samples 
that are included in only one of the two regions. 

As shown by Fig. 81c, samples falling in regions marked A 
would lead to the rejection of the hypothesis in either instance. 
Samples falling in regions C would lead to rejection of the 
hypothesis only when the standard deviation is known, while 
samples falling in regions B would lead to rejection of the hypoth¬ 
esis only when the standard deviation of the population is 
estimated from the sample. 

' The reason why the latter samples lead to rejection of the 
hypothesis despite the fact that their mean values are relatively 
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close to the population mean is that their standard deviations, 
and hence the estimates of the population standard deviation, 
are so small that the difference between the hypothetical popula¬ 
tion mean and the sample mean, small as it actually is, appeals 
: to be relatively large. Likewise, although the means of samples 
| falling in regions C differ greatly from X, these samples do not 




lead to rejection of the hypothesis when the standard deviation 
of the population is not known, despite the great difference 
between their mean values and that of the population mean, 
simply because the standard deviations of these samples (and 
hence the estimates of the population standard deviation) 
make this difference seem relatively small. Similar remarks 
apply to the one-sided regions shown in Fig. 82a to c. 
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Confidence Limits for the Population Mean. Problem When 
the Population Variance Is Unknown. The determination of 
confidence limits for the population mean when the population 
variance is not known proceeds much the same as in the case 
when the variance is known. The essential difference is that the 
t distribution is used in place of the normal distribution. 

Suppose that a manufacturer finds that the mean length 
of life of 10 electric-light bulbs manufactured by some process 
is 1,090 kilowatt-hours and that the standard deviation of this 
sample is 180 kilowatt-hours. Nothing is known about the 
standard deviation in length of life of bulbs producible by the 



new process other than the information provided by the sample. 
Under these conditions the manufacturer wishes to determine 


limits within which the average length of life of bulbs producible 
by the new process might in general be expected to lie. That is, 
he wishes to set up confidence limits for the mean of the popula¬ 
tion of bulbs producible by the new process. In setting up 
these limits he wants a confidence coefficient of .95, say. 

Determination of Confidence Limits. If the manufacturer is 
indifferent as to whether he overestimates or underestimates 
the true mean length of life, he will adopt limits that are sym¬ 
metrical with reference to the given sample mean. In this 
case, the upper limit may be found by determining a value of 
X such that the probability of getting the sample value of 

Vn (x - x) . , . . 

--- or a i ower lg Just Q25, and the lower limit 
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may be found by determining; the value of X that makes the 

. T Vn (x - x) 

probability of getting the sample value of - 01 a 

higher value just equal to .025. For if this procedure is always 
adopted in setting up symmetrical confidence limits, the limits 
so determined will include the population mean 95 times out of 
100 on the average. 1 Since the sampling distribution of 

Vn (x - x) 

a 


is a t distribution, with n in the t formula equal to N — 1, and 
since, for w = 10 — 1 = 9, the .025 points of the t distribution are 
+ 2.262, these upper and lower limits for X can be found by 

setting ^ z JQ == +2.262 and solving for X. For the 

cr 

given problem this yields the following, 

VIO (1,Q90_- X) _ 

/10 


or X = 1,225.72 as the upper limit and 954.28 as the lower limit. 
The analysis is illustrated by Fig. 83. 

If the manufacturer wishes not to overestimate the mean 
of the population and does not care whether it is underestimated 
or not, he will seek only an upper bound for his confidence inter¬ 
val; in other words, he will seek an interval that runs from zero 
to this upper bound. For a confidence coefficient of .95, such 
an upper bound may be found by determining the value of X 

that makes the probabilh^ of the sample value of ~ 

or a lower value just equal to .05. Since the lower .05 point 
of the t distribution (for n = 10-1 = 9) is -1.833, the upper 
bound for X in the present instance is given by 

yTo (i,Q90j- X) _ 1833 

‘ _ ho 


which gives the value 1,199.98 for X. The whole confidence 
i See pp. 174-179. 
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such an instance he may wish only to determine a lower bound 
for the population mean value, the upper “bound presumably 
being infinity. For a confidence coefficient of .95, a lower bound 
of this kind can be found by determining the valuQof X that 

' 1 f VN jX - X) 

makes the probability of the sample value ox „ 

or a higher value just equal to .05. Since the upper .05 point of 



980.02 °? 

Original 

units 


Fig. 85. 

the t distribution (for n = 10 - 1 = 9) is +1.833, the lower 
bound for X in the given problem is yielded by the equation 

VT5 (1,090 — X) = +1 833 

180 $ 


which gives X = 980.02. The whole confidence interval is thus 
980.02 to ». The analysis is illustrated in Fig- 85. 

Maximum-likelihood Estimate of the Population: Mean. 
Problem When the Population Variance Is Unknown. When the 
variance of the population is not known, the maximum-likelihood 
estimate of the population mean is based on the t distribution 
instead of the normal distribution. The procedure, however, is 
exactly the same. The optimum estimate of X is the value of X 

, s/N (£ - X) 

that makes the probability of the sample —z 


a maxi¬ 


mum. Since the sampling distribution of this statistic is a 
^-distribution the peak of which always occurs at zero, the given 
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sample —will have the greatest probability of 
occurrence when X is such that ^ ~ ^ = 0, that is, 


when X = X. As in the previous case, therefore, the optimum 
estimate of the population mean is the value of the sample mean. 

Use of Normal Curve with Large Samples. The foregoing 
procedure for testing hypotheses, finding confidence limits, and 
determining optimum estimates when the variance of the 
population is not known will give exact results whether the 


sample is large or small. 11 the sample is large, however, say 
30 or more, approximate results may be obtained by using the 
normal curve in place of the t distribution. That is, for large 

, 'VN (X - X) , x 

samples, - ■ becomes an - that can be looked up 


m a normal table. The basis for the procedure is that when N 
is large the t distribution is almost identical with the standard 


normal curve so that the latter can be used in its place. It is for 
this reason that the t table does not give values for n > 30. 


SAMPLING DISTRIBUTION OF THE VARIANCE 


Testing a Hypothesis. The Problem. The foregoing sections 
were concerned with the mean of the population. Consider 
now a problem that is concerned with the variance. Suppose 
that the inventor of the new process for the manufacture of 
electric-light bulbs claims that his process will reduce the variabil¬ 
ity in length of life of the bulbs. The process, it will be assumed, 
does not change the mean length of life but merely makes pos¬ 
sible a more uniform and hence a more dependable product. 

To make the problem concrete, suppose that the inventor 
claims that the standard deviation for the new electric-light 
bulbs is 180 kilowatt-hours, which represents, it will be supposed, 
a considerable reduction from the standard deviation of bulbs 
now in use. To test this claim a manufacturer produces 10 bulbs 
by the new process and finds that the standard deviation of this 
sample is 190 kilowatt-hours. 1 The question is: In view of the 
sample result, is the claim of a population standard deviation 


1 A small sample is taken to show that the analysis can be applied to small 
as well as to large samples. For a simpler method applicable only to large 
samples, see pp. 289-290. 
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of 180 kilowatt-hours a reasonable one? Or, to put this in 
terms of variances, with a sample variance of 36,100 square 
kilowatt-hours is the claim of a population variance of 32,400 
square kilowatt-hours a tenable hypothesis? 

Coefficient of Risk. Suppose, as in the case of the mean, that 
the manufacturer is willing to run the chance of rejecting hypoth¬ 
eses of this kind 5 times out’of 100 when they are actually true. 
His coefficient of risk is thus .05. 

Regions of Rejection. To assure a coefficient of risk of .05 
the manufacturer must select from the whole set of samples 
that might be drawn from the given hypothetical population 
a subset of samples the probability of which is just .05. If he 
rejects the hypothesis whenever the given sample is found 
to belong to this special subset, he will attain a coefficient of risk 
equal to .05. This subset will be his region of rejection. 

Since the manufacturer is interested here in the variance 
of the electric-light bulbs producible by the new process, the 
set of all possible samples and the chosen region of rejec¬ 
tion can most appropriately be described in terms of the 
sample variance. When the set of all possible samples is so 
described, the result is the sampling distribution of the variance. 
This, as already noted, is a skewed distribution which centers 
roughly around the variance of the population—its mean is 
N — 1 

actually —times d 2 .* If the unit of measurement is taken 

as 6 2 /N, the distribution assumes the form of a x 2 distribution, 
with n in the x 2 equation equal to N —"l.f 

Since the manufacturer wishes the variance in bulbs to be 
as small as possible, he would like to reduce to a minimum the 
chance of accepting the inventor’s claim when in actuality the 
variance of the new process is above the hypothetical figure. 
The most appropriate region of rejection for him to select is 
consequently the region comprising the upper .05 tail of the 
sampling distribution. A x 2 table shows that, for 

n = io - 1 = 9, 

this comprises all values of x 2 greater than 16.919, that is, all 

* For mean of iV<r 2 /d 2 ls N — 1, since mean x s is n. See f. 
t See pp. 111-112, 263-263. 
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samples whose variances are greater than 16.919d 2 /10, or 1.6919d 2 . 
This is illustrated in Fig. 86. 

Testing the Hypothesis. The hypothesis to be tested in the 
given instance is that d 2 = 32,400. The region of rejection thus 
constitutes all samples whose variances exceed 

1.6919(32,400) = 54,818. 

The given sample variance is 36,100, which is far from this 
region of rejection. Hence the given hypothesis is not rejected; 
the claim of the inventor is not disproved. 

Other Examples. In testing hypotheses concerning the popula¬ 
tion variance, the statistician sometimes wishes to minimize 



the risk of accepting the hypothesis when in fact the true variance 
is below the hypothetical figure being tested. In a college course, 
for example, it might be claimed that a new type of examination 
will give a better spread in grades and hence be a better examina¬ 
tion for grading the students. In testing the hypothetical 
variance claimed for the new type of examination, the statistician 
would want to minimize the risk of accepting this claim when in 
fact the true variance in grades is less than this figure. In such 
a case, the region of rejection would most appropriately be chosen 
to constitute the lower 5 per cent of the sampling distribution. 

For N = 10, for example, this would constitute all values of 
X 2 less than 3.325, or all samples whose variances are less than 
3.325d 2 /10. This is illustrated in Fig. 87. 

Still other cases might occur in which the statistician is 
indifferent as to whether the true variance is above or below 
the hypothetical variance being tested. In these cases, the most 
appropriate region of rejection would be one in which the total 
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probability of .05 is equally distributed at both ends of the dis¬ 
tribution. Unfortunately, the x 2 table has not been constructed 
so that the points marking the .025 tails of the distribution can 
be obtained without interpolation. If the coefficient of he; 
were set at .04, however, tails of .02 each could readily be deter¬ 
mined from the existing table. Thus, if n - 9, a .04 icgion o 


Scale 

units 



Fig. 87. 

rejection could be taken to constitute all values of x 2 less than 
2 532 and all values of x 2 greater than 19.69/. This region o 
rejection would comprise all samples whose variances are less 
than 2.5324710 and greater than 19.6974710. Such a region of 
rejection would afford an even balance between the risk of accept 
ing the hypothesis when the true variance is less than the hypo¬ 



thetical figure and the risk of accepting it when the true variance 
is greater than this amount. This is illustrated m Fig. 88. 

Confidence Limits for the Population Variance. The Problem. 
In many cases no particular hypothesis regarding the population 
variance is to be tested. Instead, it is merely desired to deter¬ 
mine a range of values within which, on the basis of a given 



288 


ELEMENTARY THEORY OF RANDOM SAMPLING 



sample, the variance of the population may reasonably be 
expected to lie. It is with the determination of such confidence 
limits that the present section is concerned. 

To make the problem concrete suppose that 10 electric-light 
ulbs are manufactured by the new process referred to in the 
oregomg example and that the standard deviation in length of 
lie of this sample is found to be 190 kilowatt-hours. This 
means a variance of 36,100 square kilowatt-hours. Suppose 
further that the manufacturer who is testing this new process 

?. mn n0 greater chance than 4 out of 100 that the con- 
dence limits set up will fail to cover the true value of the 
variance. That is, he wishes confidence limits whose confidence 
coefficient is .96.* 

Determination of Confidence Limits. If the manufacturer is 
^interested m both an upper and a lower bound for the population 
variance, he can set up confidence limits as follows: He can 
obtain an upper limit for d 2 by finding the value that makes the 
quantity N<r /d 2 just equal to the lower .02 point of the y 2 
distribution. For if this value is the true value, then the prob- 
abihty of the sample u 2 or some lower value will be just 02 
Similarly, a lower bound for d 2 can be obtained by finding the 
value of d 2 that makes the quantity 2V<r 2 /d 2 just equal to the upper 
.02 point of the x 2 distribution. For if this value is the true 
value, then the probability of getting the sample or some higher 
value will be just .02. The values of d 2 so determined will 
constitute the confidence limits for this parameter and the 
confidence interval so marked off will have a probability of .96 
of covering the true value. 

For the given data these upper and lower limits for d 2 are 
determined as follows: For n = 10 - 1 = 9, the lower .02 
point of^ the x 2 distribution is 2.532. The upper limit for d 2 
is thus given by IV<r 2 /d 2 = 2.532. This yields 


(10) (36,100) 

d 2 


= 2.532, 


°ftvf 2 ^ 42 i 5 ! 5 ' . Slmllar Iy, for n = 9, the upper .02 point 
of the x 2 distribution is 19.697 and the lower limit for d 2 is given 

* The CQiifidence coefficient is put at this figure instead of the usual .95 

because the * table gives the .02 points at each end instead of the usual .025 
points, 
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by AW == 19.097, or (10)(36,100)/d> = 19.697. This yields 
d 2 = 18 328, The confidence interval is thus 18,328-142,575, 
and it may be said that there is a probability of .96 that this 

interval covers the true variance. . _ 

The foregoing interval was an unbiased confidence mterva 
in that the probability of failing to cover the population value 
because the interval was set too low was equal to the probability 
of failing to cover the population value because the interval 
was set too high. If the manufacturer had wished to set up a 
confidence interval that had only an upper bound, he could have 
found that upper bound by setting AVVd 2 equal to the .95 point 
of the x 2 distribution for n = N - 1. For the given data this 
yields 10(36,100)/<J 2 = 3.325, or <S 2 = 108,571. Hence the 
manufacturer might say that the range 0-108,571 has a pro a l 
ity of .95 of covering the true value. That is, in eases of this 
kind the manufacturer would go wrong only 5 per cent of the time 
in assuming that the population variance was equal to or below 
the upper bound. It will be noted here that the confidence 
interval established in this instance had a confidence coefficient 
of .95 instead of .96 as in the previous example. The former 
was adopted because the x 2 table lacks a .96 point. 

In conclusion, it should be noted once again that, the larger 
the sample, the narrower the confidence interval. If a sample 
of 20 had been taken instead of a sample of 10, the limits for 

would have been 21,433 and 84,277. Thus, the larger the 
sample, the greater the assurance that the sample variance is 
near the true variance. It must be remembered, however, that 
a larger sample may involve increased expense. 

SPECIAL METHOD FOR LARGER SAMPLES 

As indicated in the preceding chapter, when the sample is 
large the sampling distribution of the variance is approximately 
normal, its mean being practically the mean of the population 

(more exactly, j^* 1 ) “d its standard deviation being 

<i 2 Hence for large samples, say samples greater than 30, 

any hypothetical value of d 2 may be tested by noting the value 


of 


d 2 


d 2 y/2/N 


If a symmetrical region of rejection is adopted 
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With a probability of .05, then all values of lyW 0 ut- 

. d 2 V2 /N 

side ot ± 1.96 will lead to a rejection of the hypothesis. If the 
region of rejection is put at one end, the hypothesis will be 

rejected if -~== is greater than 1.645 in one instance or if 

it is less than -1.645 in another instance. Similarly, confidence 
a ° 0efficient of ’ 95 ma y be determined by setting 
d 2 \/2/N ~ i 1.96 if both an upper and lower bound is desired, 


or to 1.645 if a lower bound only is desired, or to —1.645 if an 
upper bound only is desired. 

+bJt°^ 11U K t fu e t hi f SPeCial meth0d f0r large sam P!es, suppose 
that the batch of electric-light bulbs of the previous sample had 

numbered 98 instead of 10. Let the sample variance be 36 100 

qo ^ OT !l and consider the hypothesis that the true variance is 
32,400 (that is, d = 180). If the region of rejection is taken 
as the upper .05 tail of the normal curve, then this hypothesis 

may be tested by finding the value of — ~ ^ ■ For the given 

d 2 V2/iV 


data this is. equal to 


3,100 - 32,400 _ 

~' 1 


Since this is less than 


1.645 the sample does not fall in the region of rejection and the 
hypothesis is not rejected. 

. T®. determine confidence limits for the population variance 
m this instance, the procedure is as follows: Let the confidence 
coefficient be 96 so that the result will be comparable with that 
of the preceding section. Also, let the upper bound of the 
confidence interval be determined so that if the population 
variance has a value equal to this upper bound then the probabil- 
i y ot the sample result or a lower value is just .02, and let the 
lower bound be determined in the same manner. Then these 

upper and lower limits will be given by - 6 ’ 100 iW = +2 054 

This yields d 2 = 27,911 and d 2 = 51,090 as "the 8 limits of the 
confidence interval. It will be noted that this interval is still 
smaller than that for which N = 10 or N = 20. 

Maximum-likelihood Estimate of the Population Variance. 

^o far maximum-likelihood estimates of population parameters 




SAMPLING FROM CONTINUOUS NORMAL POPULATIONS 291 


have been found to be the sample values of the corresponding 
statistics. Thus the maximum-likelihood estimate of the per¬ 
centage of “favorable” cases in the population is the percentage 
of “favorable” cases in the sample. Similarly, whether or not 
the variance of the population is known, the maximum-likelihood 
estimate of the mean of the population is the mean of the sample. 
In the present instance, however, the maximum-likelihood esti¬ 
mate of the population variance that can be made independently 
of the sample mean 1 is not the sample variance but the sample 

N 

variance multiplied by ^. — The derivation of this result 

is as follows: 

By definition the maximum-likelihood estimate of a population 
parameter is the value of the parameter that will make the 



probability of the given sample a maximum. For an assigned 
value of the population variance the probability of a sample 
having any given variance may be determined from the sampling 
distribution of the variance. The equation for this distribution 
is Eq. (3) of Chap. X. If a particular value is given to d 2 in 
this equation, it will yield the probability of getting samples with 
various values of <r 2 . If, however, the value of a given sample 
variance is substituted in this equation, then the formula yields 
the probability of getting this particular sample for various 
possible values of d 2 . Figure 89a, for example, shows the 
probability of a sample variance of 100 when the population 
variance is 180, the size of the sample being 11. Figure 89 b 
shows (for N = 11) the probability of a sample variance of 100 
when the population variance is 80, and Fig. 89c shows this 

1 When the mean and standard deviation are estimated jointly then the 
maximum-likelihood estimate of the population variance is also the variance 
of the sample. 
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No - 2 


probability when the population variance is equal to ^ 

that is, to 110. Further calculations will show that the probabil¬ 
ity of the given a 2 is greater when d 2 has this last value than 
when it has any other value. 1 This is indicated by the curve 
shown in Fig. 90. The optimum estimate of d 2 when a 2 = 100 
and JV = 11 is thus 110. 




The algebraic counterpart of this geometrical explanation 
is as follows: For a given <r 2 , Eq. (3) of Chap. X gives the prob¬ 
ability of that <r 2 as a function of d 2 . If this probability is to be 
a maximum for d 2 , then the derivative of the function with 
respect to d 2 must be zero at this maximizing value. To simplify 

1 For simplicity the argument speaks of “the probability of the given a 2 ”; 
but a more precise expression would be “the probability of a sample variance 
lying between the given cr 2 and the given <r 2 + da 2 .” Again, “the probability 
of a sample variance of 100” is merely a short way of saying “the probability 
of a sample variance lying between 100 and 100 + da 2 .” Although the true 
probability is an area, it is only the ordinate that changes from case to case 
in this argument and is therefore all that needs to be considered here. 
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matters the logarithm of the probability is differentiated instead 
of the probability itself, but this docs not alter the result since 
the probability will be a maximum when its logaiithm is a 
maximum. Thus, from Eq. (3), Chap. X, 

log [P(<r 2 )] = ” log d 2 + terms not involving d 2 . 

Differentiating this with respect to d 2 and setting the result 

equal to 0 yields the following: 

d log [P(<r 2 )l __ Ncr* _N 1 _ 0 

~~3d 2 " 2(d 2 ) 2 2d 2 

or 

a) 

It may seem strange at first thought that the optimum value 
of d 2 is not the value that makes the given cj 2 fall at the mode 



or peak of the sampling distribution. On more careful examina¬ 
tion, however, it is seen that the sampling distribution of <x 
changes not only its position but also its shape with changes 
in the value assigned to d 2 . The height at the mode of one 
curve may thus be less than the height at some other point on 
another curve. In the present instance it turns out that the 

height of the curve given by <J 2 = j/zTl ^ at tlie pomt 1S 
greater than the maximum height of the curve whose mode 
occurs at <r 2 . 

The explanation of this somewhat unexpected result tor the 
maximum-likelihood estimate of d 2 lies in the fact that the 
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variance of a sample is always measured from the mean of 
the sample and not the mean of the population. Since the mean 
itself varies from sample to sample, any estimate of the popula¬ 
tion variance that ignores this variation in the mean will tend 
to underestimate its true value. It will be seen in Chap. XIV 
that, when the mean of the population and the variance of the 
population are^estimated jointly, the joint maximum-likelihood 
estimates are X = X and <r 2 = a 2 . 

SAMPLING DISTRIBUTION OF THE RANGE 

Table XIII of the Appendix 1 giving pertinent facts regarding 
the sampling distribution of the range may be used to estimate 
the population standard deviation in a quick and ready manner; 

for the values given in the table are for w = X \ and also 

d 

the mean values for w in samples of varying size are given. 
On the assumption that a given sample range is close to the mean 

range for samples of the given size, it follows that ~ n ~ X l 
.. • w 
will give a rough estimate of d. Suppose, for example, that the 
range of a sample of 10 cases is 50, then 50/3.078 or 

50(.325) = 16.25 

is a crude estimate of d. Furthermore, by using the .025 points 
of Table XIII it will be seen that the chances are 95 out of 100 
that the interval from 50/4.79 to 50/1.67, that is, from 10.44 
to 29.94, includes the standard deviation of the population. 

If all the data are available, good results involving little 
labor can be obtained by breaking up a large sample into a 
number of smaller samples of the same size, calculating the range 
for each of these samples, taking the average of these sample 
ranges, and using this average range to establish limits for the 
population standard deviation. Suppose, for example, that 
10 samples of eight cases each showed ranges of 30, 24, 16, 28, 11 
20, 22, 35, 17, 25. The mean of these 10 sample ranges is 22.8. 
Table XIII shows that, if the population standard deviation is 
d, the mean range for samples of 8 is 2.847d. It also shows 
that the standard error of the range is .820d. Hence the stand- 

1 See p. 482, Note X n = largest, X x — smallest case. 
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ard error of a mean of 10 sample ranges would be .820/VlO. 
ifthe present instance^therefore, the .95 confidence mterval 

\/l0 (22.8 — 2.847d) _ gp (the latter 
for d is given by - ~/g20d 

being the upper and lower .025 probability points oi the norma! 
curve) The confidence interval for d is accordingly 6.8 to 9.7. 

The' sampling distribution of the range may also be used 
for quick analysis in certain problems m which the variation 
in means of particular groups is being tested for significance 
For example suppose that five classes of 12 students each 

hl"”S ZL of 77.25, 77:83, 70.32 09.02, .»d 7«0 

and the ranges of individual grades for the five classes aie 
31 34 24 44, and 44 points. On the basis of the variation 
of ’grades ’indicated by the ranges, is the variation m mean 
grades more than might reasonably be attributed to chance. 
This is the question that may readily be answered as; fo lows: 

The mean of the five ranges is a mean range of 35.4 points 
From Table XIII of the Appendix it is seen that for groups of 
12 the mean range is 3.258d. Hence 35.4/3.258 - 10.86 may 
serve as a rough estimate of the population standard deviation d. 
Now if d = 10.86, then means of 12 grades should have a 
d = 10 86/VT2 = 3.135 and Table XIII of the Appendix 
shows that for groups of five the mean range for a variable whose 
d is 3.135 is 2.326d = 2.326 X 3.135 = 7.292. For the five 
mean grades given above the range is 7.91 and is thus just about 
what would be expected on the basis of chance. Hence the 
analysis of the variation in mean grades based on the ranges o 
grades in the individual classes indicates that the variation m 
means is apparently no greater than may reasonably be attributed 
to chance. This is identical with the conclusion reached m 
Chap XVIT after a more refined analysis of variance, lhe 
rougher analysis presented here is especially valuable as a 
preliminary analysis for throwing out those cases m which the 
variation in means is shown by the rougher analysis to be clearly 

not due to chance. . v . 

The various percentage points of the sampling distn u 
of the range may also be useful in setting up charts to control 

i See Smith, J. G., and A. J. Duncan, Elementary Statistics and AppKca- 
tions, Chap. XV, for discussion of correlation ratio. Also, see Chap. XVI 
below on Analysis of Variance, 
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the quality of industrial mass production. An interesting His 
cussion of the use of the range in industrial quality control may be 
found m reports issued by the British Standards Institution^ 

for DenaSr/ r g * tribut , ions ° f ^ (or ^ 9») and « to Test 
for Departure from Normality. As indicated in the previous 

chapter, the sampling distributions of fa and fa (or Sl and^ if the 
worker prefers these more refined statistics) Ind of a = A D/j 
ay be used to test departure from normality. For illustrative 
purposes consider the data on the weights of 300£££ 
freshmen 2 discussed in Chap. VII. ° n 

For these data, 

VFi - .637, 02 = 4.6720, gi = .635, g 2 = 1.7320 

c^ied^uSollolr ° f d6PartUre fr0 “ “^ be 

For samples of 300, 




= .141 


da. 


^ / 24 

W “ ' 


283 


and the ratios of Vfa and fa - 3 to their standard errors are 



4.5 


.637 
.141 

4.6720 - 3 
4283 


5.9 


Both these indicate a marked departure from normality in that 
the probability of either Vfa or fa - 3 exceeding thdr stLard 

s large as 300, p^and g 2 and their standard errors are almost 

f SIS; ts 

Methods to ^Q^Ut^cInlrot British The A Wlimtion of Statistical 

and Pearson Ft™’ f , h Standarde Institution, No. 600 (1042)- 

Slandardization and Quality ^ro ^•" 

(1935). ^ Ll Jjribls ^ 1 Standards Institution, No. 600 

2 See pp. 138, 141. 
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not modified by the use of the more accurate standardl errors 
of <7i and < 72 . Table XI of the Appendix also shows that the 
samole value of ft = 4.6720 is considerably beyond the upper 
* p£cenTpoint, and Table XII of the Appendix shows* tha 
the sample value of a = .7640 is below the lower 1 per cent point 
for a. Hence all the tests indicate a distinct departure fron 

normality. 


1 See p. 480. 

2 See p. 481. 




CHAPTER XII 


SAMPLING FLUCTUATIONS 
IN CORRELATION STATISTICS 

SAMPLING DISTRIBUTION OF THE CORRELATION COEFFICIENT r 

Correlation coefficients have sampling distributions that are 
different from any yet discussed. Thus, if a large number of 
samples are taken at random from a normal bivariate population, 1 
and if a frequency distribution is made of the sample values of 
ru, the shape of this distribution will be found to conform to 
the equation 2 


H __ r 2 V v -i IV-4 

dP(rn) = -- v, T 12 L r (1 - r-y 2 


ir{N — 3)! 


d N ~' 2 /cos- 1 (-r 12 r I2 )\ , 

\VT- rf 2 rf 2 ) ^ (1) 


where ru>, m boldface type, refers to the population correlation 

d N ~ 2 


coefficient and 


(dr l2 r 12 ) N - 


? /cos- 1 (-r 12 r 12 )\ .. .. 

means the ^ - 2 


A s ^ m P le a mon ovariate or univariate population consists of a 
group of individual values; a sample from a bivariate population consists of 
a group of pairs of values.. For example, a sample from the univariate 
population consisting of white male heights might be represented bv the 
measurements 68, 70, 69, 74, 70, 71, 72, 71, 68, 69 inches, each of which 
represents the height of a particular individual. A sample from the bivari¬ 
ate population consisting of white male heights and weights might be repre¬ 
sented by pairs of measurements: 

68 70 69 74 7o’ 71 72 71 68 69 inches 
140 150 149 206 165 183 190 175 154 142 pounds 

each pair consisting of the height and weight of a particular individual. 
It is to be noted that each of the above samples is said to be a sample of 10 

r in Ug 4 u the SeC ° nd contains 20 measurements. They are each a sample 
or 10 m the sense that measurements are made only of 10 individuals. 

2 C f- R * A *’ “ 0n the ‘Probable Error’ of a Coefficient of Correla¬ 

tion Deduced from a Small Sample,” Metron, Vol. 1 (1920-1921), Part 4, pp. 

1 62, ’ 
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derivative of the expression in parentheses with respect to i n ru- 
It is to be especially noted that the shape of this distribution 
depends not only on N, the size of the samples, but also on the 
population coefficient of correlation, r 12 . For small valuers of r i2 
the distribution is practically symmetrical; 1 for large absolute 
values of r 12 , however, the distribution is very skewed. Injs 
is illustrated in Fig. 91. The character of the sampling dis¬ 
tribution of r is in part related to the fact that in and r n mus 
always be between -1 and +1. Hence if r 12 is close to +1, 
say .94, then the range of values above in that might, possibly >e 
taken by the sample n 2 is very small compared with the range 



and a standard deviation equal approximately to 1/vW ~ 


1 For r 12 = 0 it is perfectly symmetrical. 

* The distribution of z may be obtained by substituting 

- 1 

ri? - " e 2 * + 1 

in Eq. (1). The mean of the distribution of sis always, except when r l2 = 0, 
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Accordingly, if this “2 transformation” is used, a sampling 
distribution can be obtained that for all but very small samples 
is practically independent of the population correlation coefficient 
and is moreover approximately normal in form. The 0 trans- 
formation thus makes it unnecessary to use elaborate tables of 
the distribution of r 12 in making estimates and testing hypotheses 
regarding the population correlation coefficient. 

Again it is to be noted that all the foregoing depends on the 
assumption that the population of and X 2 values is distributed 
m the form of a joint normal frequency distribution. 1 


USE OF THE SAMPLING DISTRIBUTION OF z = tanh' 1 r 

. tes ^ n & hypotheses and setting up confidence limits regard¬ 
ing correlation coefficients the most practicable procedure 
is to make use of the 2 transformation. For, as pointed out 
above, the sampling distribution of 2 is. practically normal in 
form, and tables of the normal distribution are generally avail¬ 
able. Since 0 is normally distributed, all the previous discussion 
regarding any normally distributed statistic applies here, such 
as, for example, the discussion of the sampling fluctuations 
of the mean when the population standard deviation is known. 
The following discussion will therefore be primarily devoted to 
illustrating the use of the 2 transformation itself. 

Testing a Hypothesis. In Smith and Duncan's Elementary 
Statistics and Application the simple correlation coefficient 
between the standing of Mount Holyoke students in first- 
semester English and their standing in second-semester English 
was found to be r 12 = .89576. A table of hyperbolic tangents 
(see Appendix, Table XIV) shows that the angle for which 


slightly greater m absolute value than the population value of z 12 , the “bias" 
being given roughly by — r il—. 

A more accurate formula that indicates the closeness of the approximation 
lor the standard deviation is as follows: 


cr 




N - 3 + 


1 

MMX - 3)rf 2 


1 For nonnormal cases see discussion, p. 451. 
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.89576 is the hyperbolic tangent (it will be recalled that 

n 2 = tanh Z 12 , 

f or £ 12 = tanh -1 r i2 ) is approximately 1.4506. If tables are not 
available, z may be calculated from the equation 

012 = |[log e (1 + ri 2 ) — log, (1 — ri 2 )] 

- 2CW [logI ° (1 + ri2) " logl ° (1 " fl2)1 

For the given case this yields z = 1.45049, which checks to thiee 
places with the value derived from the table. 1 Here the sample 
value of z 12 will be taken as that given by the table, viz., 

z 12 = 1.4506. 

Suppose it is argued that the true correlation coefficient 
between first- and second-semester English grades is .9900. How 
does this hypothesis fare with reference to the sample coefficient 
of .89576? To test this hypothesis convert both these correla¬ 
tion coefficients to their corresponding z values. The table of 
hyperbolic tangents indicates that z = tanh -1 .9900 = 2.6465 
and, as previously, z = tanh -1 .89576 = 1.4506. Hence the 
hypothetical z is 2.6465, and the sample z is 1.4506. The stand¬ 
ard deviation of z — — _r | = ^ ince 2 

is normally distributed, symmetrical regions of rejection would 
be given by the hypothetical z plus or minus 1..96d*, or in this 
particular case by 2.6465 ± .22193, or 2.8684 and 2.4246. Since 
the sample value falls below 2.4246, the hypothesis would not be 
accepted. 

Confidence Interval for r i2 . In order to determine a confidence 
interval for r i2 from the given sample, it is necessary merely to 
determine a confidence interval for the population z i2 and convert 
this into a value for r i2 . Thus, suppose that symmetrical confi¬ 
dence limits are desired (symmetrical in the sense that failure to 
cover the true value is equally probable at one end and at the 
other). For z L2 these are given by the sample z i2 ± 1.96d,. In 
the present instance, 

1 The difference is due to the difference in decimals carried, methods of 
interpolation, etc., and not to any difference in definition. 
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<L = 


= .11323, and 


VN - 3 \/81 - 3 

1.96d, = .22193. 

Accordingly, the upper confidence limit for z 12 is 
1.4506 + .22193 = 1.6725, 
and the lower confidence limit for z 12 is 

1.4506 — .22193 = 1.2287. 

Interpolating in a table of hyperbolic tangents shows that the 

fTo^ ta ^f 0 nt 0f L6725 is - 93188 and th e hyperbolic tangent 
of 1.2287 is .84220. These values may also be obtained from the 
inverse of the equation for the z transformation given above, viz., 


^2012 


7*12 


?2Sl2 


+1 


O UP F7 ^ 2Zl2 equals 3 ' 3450; e3 ' 3460 equals 

.3 5°(.43429) - 1.45270; and e 3 - 3450 equals the antilogarithm 

S iq/9Q%lQ 359 'oo?oT eqUently ’ the UPPer Hmit f0r r * 2 e< l uals 

27^359/29.359 = .93188, which checks the interpolation in the 
table, bimilar calculations would check the .84220 lower limit 
for ri 2 . 

In summary, the confidence limits for ri 2 are .84220 and 
.93188; it may thus be concluded that the range .84220-93188 
covers the true value of ri 2 with a probability of .95. To put it 
another way, any hypothetical value of r ]2 that is below .84220 
oi above .93188 is deemed an unreasonable value in view of the 
sample value r 12 = .89576. 

Confidence limits could also be set up and hypotheses could 
be tested msucha way that the region of rejection all came at 
one end. All this would be a duplication of the discussion of 
Ohap. XI, however, and need not be repeated here. As men¬ 
tioned above, once r 12 has been transformed into z 12 , all the 

analysis pertaining to normally distributed variables can be 
applied. 

Optimum Estimate of the Population Correlation Coefficient. 

An estimate of the population correlation coefficient that is 
independent of the estimates of the means and variances of 
the population can be made from the sampling distribution of 
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r i2 . The independent maximum likelihood estimate of ri 2 is 
obtained by so choosing r i2 that the probability of the ^sample 
n 2 , as indicated by Eq. (1), is a maximum. 1 This yields the 
approximate result 2 


?12 


■ r 12 ( 1 - ^ 12 ) 
ri2 '2(N~- if 


( 2 ) 


That is, the sample r 12 is somewhat too high as an estimate 
of the population correlation coefficient and needs to be corrected 

by the subtraction of the quantity ' For exam P le ’ 

if m = .89576 and N = 81, then the maximum-likelihood 


estimate of rj 2 is 


.89576 - C!?576KL_^57-^) = 89465 . 


When ri 2 is estimated jointly with the means and variances 
of the population, it is found that the maximum-likelihood esti¬ 
mate of r i2 is the sample n 2 . 


‘ SAMPLING DISTRIBUTION 
OF OTHER CORRELATION COEFFICIENTS 

Partial Correlation Coefficients. The analysis that has been 
described in the preceding section for a simple correlation 
coefficient can be applied with very little modification to partial 
correlation coefficients. The only change required in the 
formulas is to subtract from N the number of variables held 
constant. As before, the practicable procedure is to take 
z = tanh" 1 rnjc ... and to treat z as if it were normally distributed. 
The mean of the distribution of z is again approximately the 
value of z corresponding to the population correlation coefficient, 
tij.k ...) that is, z = tanh -1 r; but the standard deviation 
of z is now the reciprocal of the square root of N — m — 3, where 
m is the number of variables held constant. 


1 See pp. 181-182. ..... 

2 This is obtained by taking the derivative of expression (1) with respect 
to n 2 and setting the result equal to zero.' The formula is only a first 
approximation. Better approximations may be found m Karl Pearson s 
Tables for Statisticians and Biometricians , Vol. II, p. 253. Another approxi¬ 
mate equation that is commonly used is 

r*(N - 1) - 1 

n - 2. . . 
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To illustrate, consider the determination of confidence limits 
for the r 12 . 34 of the Mount Holyoke data. For these it was 
found that ri 2 . 34 was .80440. For this value of r, the value of 2 
is 1.1110. The standard deviation of z in this instance is 


1 


VsT^~2^~3 


= .1147. 


z are 


Accordingly, symmetrical .95 confidence limits for 
1.1110 + 1.96(0.1147) 

and 1.1110 - 1.96(0.1147), or 1.3358 and 0.8862. The r 12 . 34 
limits corresponding to these two z limits are, respectively 1 8706 
and .7095. 

Hypotheses regarding r 12 . 34 can be tested and estimates can be 
made in the same manner as hypotheses and estimates pertain¬ 
ing to simple correlation coefficients except that the number 
of variables held constant (m) must always be subtracted from N 
in all the equations used. 

Multiple Correlation Coefficients. When the population 
multiple correlation coefficient is zero, the statistic 

R a /(k - 1) 


(1 - R*)/(N - k) 

has a sampling distribution of the form of the F distribution, 
with n,. = k - 1 and n 2 = N - k. Here R is the multiple 
correlation coefficient, k is the number of regression statistics 
ai.234, 612 . 3 ..., 613 . 2 ..., etc., in the regression equation, and N is the 
size of the sample. Hence the F distribution can be used to test 
whether a given multiple correlation coefficient is significantly 
different from zero. 

^For the Mount Holyoke data, for example, it was found that 
^ 4.123 = -2529, and the question arises whether this is sig¬ 
nificantly different from zero. To answer this the hypothesis 
that the population value is zero (the “null hypothesis” as it 
may be called) is set up and tested as follows: The number of 
independent variables is three, and the number of regression 
statistics is one more than this, or four. Hence, k = A and 




.2529 


k - 1 


= .0843. 
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1 — 7?2 74.71 

Since N — 81, ^ “ *00970. Therefore, 

&/(h - 1) _ *0843 _ 

(1 - R 2 )/(N - k) .0097 

The F table (see Appendix, page 476) shows that, for ^ = 3 
and n 2 = 60, the .05 point is 2.758 and, for n\ = 3 and = 00 , 
the .05 point is 2.605. Accordingly, the .05 point for ni = 3 
and ft 2 = 77 lies between 2.758 and 2.605. Since the sample 
value is much greater than this, the null hypothesis is obviously 
rejected and the multiple correlation coefficient must be declared 
significantly different from zero. That is, there must be some 
multiple correlation between the variables concerned. 

When the population correlation coefficient is not zero, the 
distribution of the multiple correlation coefficient is more 
complicated. This was worked out by R. A. Fisher in 1928. 1 
Using Fisher’s analysis, Mordecai Ezekiel has drawm a series of 
charts that give the lower confidence limits for the population 
multiple correlation coefficient for samples of varying size and 
varying numbers of independent variables. The confidence 
coefficient used was .95. These charts are to be found in the 
Appendix to Ezekiel’s book on Methods of Correlation Analysis. 
For N — 75 and k = 4, for example, Ezekiel’s Chart C shows 
that, if the sample multiple correlation is .57, then the lower 
confidence limit for the population value is .40. For the Mount 
Holyoke data the various multiple correlation coefficients are 
Ei .234 ^ .9056, E 2.134 = .8983, E 3.124 = .6326, and E 4.123 “ .5029. 
For each of these, N — 81, and k — 4. Ezekiel’s Chart C 
shows that the lower confidence limit for these are approximately 
.85, .85, .49, and .32, respectively. In other words, the chance 
is .95 that the ranges .85-1, .85-1, .49-1, and ,32-1 cover the 
population value in each case. 

The maximum-likelihood estimate of the multiple correlation 
coefficient is given by the approximate formula 

P2 = E 2 (iV - 1) - (k - 1) 

N - k 

1 Proceedings of the Royal Society of London, Series A (1928), Vol, 121, pp. 
654-673. 
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Thus the maximum-likelihood estimate of Rf <284 is 


.9056(80) 

77 


= .9019 


Correlation Ratio. If the correlation ratio for the population 

y}2 /rfc _ -j\ 

is zero, then the statistic — k) a ^ s0 ^ as a sam P^ n § 

distribution of the form of an F distribution with ni = k — 1 
and n 2 = N — k. Here k stands for the number of means 
entering into the calculation of 7712 . 

To illustrate the use of the sampling distribution of rj 2 12 con¬ 
sider again the Mount Holyoke data. For these, i)l 2 = .82124, 
k — 11, N — 81, and thus 


.82124/10 _ .082124 
.17873/70 .002553 


= 32.168 


To test the hypothesis that the true correlation ratio is 0, take 
a coefficient of risk equal to .05, and take the upper .05 tail as 
the region of rejection. For this case, n\ = 11 — 1 = 10, and 
n 2 — 81 — 11 = 70. For ni — 8 and n 2 — 60, the F table shows 
that the upper .05 point is 2.097; and, for m = 8 and n 2 = 00 ; 
the upper .05 point is 1.938. For ri\ = 12 and n 2 = 60, the upper 
.05 point is 1.918; and, for m = 12 and n 2 = 00 , the .05 point 
is 1.752. Interpolation for n\ = 10 and n 2 = 70 is unnecessary 
since it is obvious that 32.168 is far beyond the upper .05 point for 
this problem and the correlation ratio is certainly significantly 
different from zero. 

Correlation Index. If a curve such as a parabola is fitted to 
the original data and the correlation index for the population is 

zero, then the statistic ^£- ^ i s likewise distributed 

like F with m — k — 1 and n 2 — N — h, k being the number of 
regression statistics in the equation for the curve. When 
logarithms or reciprocals are used to make a distribution linear, 
the correlation between the logarithms or reciprocals can be 
treated as any ordinary linear correlation coefficient. 

Test for Linearity. Since a correlation ratio and a correlation 
index are always greater than a correlation coefficient calculated 
for the same data, the question may be asked: How can it be 
determined when a distribution is really linear and when non- 
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linear? This question can be answered by a sampling test. 
Thus the null hypothesis is set up that the population is linear 
and this hypothesis tested in the light of the difference between 
the correlation coefficient and the correlation ratio (or the corre¬ 
lation index) for the given set of sample data. 

If the population is a normal bivariate distribution in which 
the progression of means is actually linear, then the statistic 

(^ZJCI^LZ _?1 or the statistic rwbwV"-il 

(i - „2 )/(n -k) a - 1 >/W V 

is distributed like F with ni = k — 2 and n 2 = N - k, where k is 
the number of means from which the correlation ratio is computed 
or in the case of I the number of regression statistics in the cqua- 

tion of the curve. . 

For the Mount Holyoke example a correlation index was not 
computed; but the correlation ratio was calculated, and accord- 
ing to the above procedure a test for linearity can be made as 

follows: 

r? s = .80239 vh = 82123 k = 11 IV = 81 
1 n, = 9 n, = 70 


Hence. 


.82123 - .80239 
__ g - 
.17877 
70 


.00209 Sion 
(00255 " - 8196 


The .05 point in the F distribution for m = 9 and n 2 = 70 is 
between 1.752 and 2.097. Thus .8196 is well within that limit, 
and correlation is presumably linear. 




PART III 


Advanced Sampling Problems 

CHAPTER XIII 

SAMPLING FROM A DISCRETE MANIFOLD POPULATION 

Chapter IX was concerned with sampling from a discrete 
twofold population. It is the purpose of the present chapter to 
extend the discussion of this earlier chapter to a discrete mani¬ 
fold population, a population that is divided into more than 
two classes. The theory of sampling from a discrete manifold 
population has widespread application, for in many discrete 
populations the cases are grouped into several classes. Further¬ 
more, the cases of any continuously distributed population can 
be, and commonly are, arbitrarily grouped into a finite number 
of classes. 

. The theor etical argument pertaining to manifold populations 
is fundamentally an extension of the argument pertaining to a 
twofold population. The ensuing analysis will accordingly 
pursue the same line of attack as that of the earlier chapter 
The argument will be presented in full; but those parts that are 
identical with the earlier argument will be reviewed only briefly, 
and attention will primarily be centered on the complications 
that arise from a manifold instead of a twofold division of the 
population. 

THEORETICAL ARGUMENT 

Fundamental Assumptions. The fundamental assumptions 
are the same as in the simpler case. It is first assumed that 
the sample is small relative to the population from which it is 
drawn. It can therefore be assumed that the percentages of the 
cases belonging to the various classes of the population are not 
changed by the withdrawal of the sample values. This will, of 
course, be only approximately true if the population is actually 
finite and will be perfectly true only for an infinite population 

308 
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Although the sample is assumed to be small relative to the 
population, it is nevertheless assumed to be sufficiently large 
to permit the use of certain approximations. In some of the 
theoretical illustrations very small samples are used for the 
sake of simplicity, but in the discussion of the practical applica¬ 
tion of the argument it is assumed that the samples are large in 
the aggregate. As a general rule, it is assumed that the product 
of the population percentage of any class times the size of the 
sample (that is, piA) is at least equal to 5. 

A second fundamental assumption is that the method of 
sampling is a random one. This implies once again that the 
results of repeated sampling by the selected method can be 
predicted by the calculation of probabilities for some mathe¬ 
matical model. The appropriate model will be described in fhe 
next section. 

In order to make the analysis concrete, suppose the problem 
is that of estimating the percentages of various types of religious 
adherents in a given locality. The alternatives are taken 
to be Catholic, Protestant, and other religious denominations, 
including atheists, and the problem will be to estimate by means 
of a random sample the percentages of these various religious 
adherents in the given population. 

The Sampling Distribution of Percentages. In making infer¬ 
ences about the percentages of various attributes in a manifold 
population, the sample statistics that naturally offer themselves 
for this purpose are the sample percentages of these attributes. 
Subsequently, another statistic will be described that is more 
commonly used in large samples; but for the moment attention 
will be centered upon the sample percentages. To make use 
of these percentages in testing hypotheses, etc., the sampling 
distribution of the percentages must be derived. This will be 
the principal task of the present section. 

Derivation. The analysis of Chap. IX suggests that the 
sampling distribution of percentages can most conveniently be 
derived from the following mathematical model: Suppose 
10 packs of cards in each of which there are pi per cent red, 
p 2 per cent white, and p 3 per cent blue cards. I^et all possible 
combinations of N cards each be made by combining without 
restriction each card in any given pack with a single card from 
each of the other packs. Since the probability of a red card in 
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each group is pi, the probability of a white card is p 2 , and the 
probability of a blue card is p 3 ; and since combinations are formed 
without restriction so that the selection of any card from any one 
pack does not affect the probabilities in other packs, then by the 
multiplication theorem the probability of a combination having, 
for example, the first three cards red, the second five white, and 
the last two blue, is pfplpl- 

The probability just indicated is true for any combination, 
however, in which there are three red, five white, and two blue 
cards. The total number of such combinations is given by the 
combinatorial formula 1 

C* - Nl 

^ NiINiINbI 

which, for the above example, equals 


10 ! 

3 ! 5 ! 2 ! 


2,520 


Accordingly, the total probability of such a combination would be 


10 ! 

3 ! 5 ! 2 ! 


PiPaPs 


For N packs of cards and for combinations containing N cards, 
the probability of a combination containing N i red, A 2 white, 
and N 3 blue cards would be 

NI 

AMJV 2 !JV 3 ! pl P2 p3 
where Ni + iV 2 + JV 3 = N. 

The last probability may be taken as a good prediction of 
the relative frequency with which samples of N would have Ni 
Catholics, iV 2 Protestants, and JV 3 other denominations in the 
whole set of samples of size N that might, by repeated sampling 
(with replacements 2 ), be drawn at random from the given popula¬ 
tion. This is true because, if the process of sampling is random, 

1 See pp. 24-26. 

2 Since the population is finite, each case would have to be “put back" so 
as to be eligible for withdrawal on the next draw. This need only be true, 
however, for the theoretical argument. In practice it is assumed that for 
very large populations the effect of not making replacements is so slight 
that it can be ignored. Cf. pp. 187-188. 
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it is reasonable to suppose that every particular combination of 
persons will appear as frequently as every other combination 
so that the relative frequency of samples of N having Ni Cath¬ 
olics, N 2 Protestants, and N s other denominations will tend to be 
the same as the relative frequency of combinations of N i red, N 2 
white, and N s blue cards among the set of all possible combina¬ 



tions of N cards that might be made by selecting a card from 
each of N different packs. The formula for the sampling dis¬ 
tribution of percentages from a threefold population is thus 


P 


'Ni N2 Nz\ 
^N ’ N’ N) 


N\ 

WJnW7\ 


pfpfpf 


or, for a manifold population, 
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It will be recognized that this is merely a generalization of the 
formula for the binomial distribution. Its technical name is 
the multinomial distribution” since it is the equation also for 
tne terms of a multinomial expansion, that is, 

(Pi + p2 + * • ' -f p*)*. 

The Multinomial Distribution. In the case of the binomial 

distribution, a sample could be described by a single percentage 

figure. 




1 ; and as soon as ~~ was specified, the 


type of sample was fully designated. A graph of the binomial 
distribution could therefore be reduced to an ordinary two- 
dimensional graph. For a multinomial distribution, however 
this is not possible, for there are at least two independent per¬ 
centages that characterize each sample. A graph of a multi¬ 
nomial distribution must therefore be multidimensional. 

Illustration of Symmetrical Multinomial Distribution. A con¬ 
crete illustration of a very simple multinomial distribution 
is shown m Fig. 92. This is a graph of the multinomial distribu¬ 
tion for which N = 5, k = 3, and Pl = p 2 = p 3 = *. Accord¬ 
ingly, the graph represents the distribution of probabilities of 
various types of samples of size 5 drawn at random from a three¬ 
fold population m which the percentages of cases in the three 

classes are all equal. The equation for this particular distribu- 
tion is 

p(Ni Nj NA _ 5! PV' /lYVA"* 

Kn’n’nJ NW^'A z) \3/ \3j 

= _«I_ /iy 

2V i !2Va !2V 3 1 \3/ 


and the values of P 


(Nl Nj N:\ 
\N’ N’ N) 


given by this equation for 


various values of — iVz — J 


Ni N 2 , iv 3 

Jf’ jy ’ and W are reoorcie d in Table 30. 


Since there are three classes, the graph of this particular 
multinomial distribution is drawn in three dimensions. Each 
point m the three-dimensional diagram represents a particular 

set of values for |- 3 . These combinations of 

values which are possible under the conditions of the problem 
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Table 30.—The Multinomial Dibtmbution Repbesenting the 
Sample Percentages for Which N = 5, k = 3, and pi - pa - p» - s 


fry . 

Typo of sample 

' N 2 . 

.. 

Probability 

/m m n,\ 

~N 

A r 

AT 

— - \N' N N ) 

—:■■■ : , " 


1 

1.0 

.0 

.0 

243 

1 

.0 

1.0 

.0 

243 

1 

.0 

.0 

1.0 

243 

5 

.8 

.2 

.0 

243 

5 

.8 

.0 

.2 

243 

5 

.2 

.0 

.8 

243 

5 

■ .2 . ' 

.8 

.0 

243 

5 

.0 

.8 

.2 

243 

5 

.0 

.2 

.8 

243 

10 

.6 

.4 

.0 

243 

10 

.6 

.0 

.4 

243 

10 

.0 

.6 

.4 

243 

10 

.4 

.6 

.0 

243 

10 

.4 

.0 

.6 

243 

10 

.0 

.4 

. 6 

243 

20 

,6 

. .2 

.2 

243 

20 

.2 

.6 

.2 

243 

20 

.2 

.2 

.0 

243 

30 

.4 

.4 _ 

.2 

4 

243 

30 

. 4 

.4^.'’ . 2 .^ ... 4 .' 

. ‘'L 

243 

30 
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are marked by large dote. The probabilities of getting these 

TIT ™ mbmat J lons ’ expressed in terms of ^ds, are written 
slightly below and to the right or to the left of the dote. 

First, it will be noticed that only a limited number of —- 1 , — 2 , 

and combinations are possible. This is because f- 1 , T and 

2V 3 N N 

must a l wa ys add up to 1. The various values of 


N: 


Ni , N. 


N 

N. 


N’ 

This condition, as the mathematicians put it, restricts the 

With- 


and are thus subject to the condition that — , — 

™ . .. . N N 1 N 


“degrees of freedom” in selecting |i, and values. 


out it there would be three degrees of freedom (three dimensions) 
for the selection of an d values; with it the degrees of 

freedom are rcduced .<» two (two dimensions). Geometrically 
this means that the - 1 , and values must be confined to 
the slanting plane ABC, the equation of which is the condition 

N N ^ JV L ' lhis Particular aspect of the problem is 

emphasized here, because the concept of degrees of freedom will 
enter into most of the subsequent sampling analysis. If it is 

mastered now, the student should have little trouble with what 
is to come. 

The symmetry of the given multinomial distribution will 
so be noted Those combinations that have the most even 
distribution of cases among the three classes (.4,.4,.2; .4 2 4- 
and .2, 4,.4) have the highest probability of occurrence FAL) • 
the extremely uneven combinations (1.0,0,0; .0,1.0,0; and 
0,0,1.0) have the lowest probability (,*,); and similarly for the 
intermediate combinations. This symmetry is due to the equal- 

I °T’ P2 ’ a T 1 Ps - A subsequent example will demonstrate 
what happens to the distribution when the p’s are unequal. 

, Means and Standard Deviations of a Multinomial Distri- 
butum. The. general similarity of the multinomial distribution 
to the binomial distribution (indeed the latter is merely a special 
case of the former) suggests that the equations for the means 
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and standard deviations of the multmOTiial^ diB^b^dcn will 
be of the same type as those of the binomial distribution This 
is in fact the case. The mean or expected percentage of cases 
falling in class 1 is pi, the mean percentage of cases falling 1 
class 2 is p 2 , and in general the mean percentage of cases falling 
in class k is p*. The standard deviation of the percentage of 
cases falling in class 1 is 1 


6n 

N 




(l^Pi) 


N 


the standard deviation of the percentage of cases falling in 
class 2 is 


dm 

N 


_ lp 2 (l - Pa) 

- \ 


and, in general, the standard deviation of the percentage of 
cases falling in class k is 



No general proof will be given of these equations. Ihui 
validity may be tested, however, by applying them tqthe data of 
Table 30 and by comparing the mean and standard deviation 
thus obtained with the mean and standard deviation obtained 
by the use of conventional methods of calculation. 

By definition the mean of a variable X< which has the relative 

frequencies pi is 

X = SpiXi 

The mean value of Ni/N (i.e., the mean percentage of cases 
falling in class 1) is accordingly 

vA (Ni N 2 Nz\ Ni 

Numerically, the value of X for class 1 can thusbe 
multiplying the figures of the first column of Table 30 by 

i For the binomial case the corresponding equation is 




. /pi(l - Pi) _ JPlE! 
\- W~~ ~ ' N 


since pi ~b P 2 — 1 



dlt) 
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corresponding probabilities given in the last column. The results 
ot these row-by-row calculations are given in the first column 

Table 31.—Calculation of the Mean and Standabd Deviation of 
m i+' • i f ERCENTAGE 0F Cases Falling in Class 1 

ultmomial distribution representing the sample percentages for which 
iV — 5j fc = 3, and pi = 02 = d« = 


P {Ni Nz Ni\ 
\N / \N’ N’ N ) 

CXVpC^ x»\ 
\nj ^v-n'jp-n) 

1 

1 

243 

243 

4 

3.2 

243 

243 

4 

3.2 

243 

243 

1 

.2 

243 

243 

1 

.2 

243 

243 

6 

3.6 

243 

243 

6 

3.6 

243 

243 

4 

1.6 

243 

243 

4 

1.6 

243 

243 

12 

7.2 

243 

243 

4 

.8 

243 

243 

4 

•8 

243 

243 

12 

4.8 

243 

243 

12 

4.8 

‘ 243 

243 

6 

1.2 

243 

243 

' 81 " 

37.8 

243 

243 


?■,?*»* SU “ of 1 this - column is therefore the mean 
\ alue of Ni/N. Consequently, X for Ni/N is -jft 


'TTW 


1 

■S'- 
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By use of a short method the standard deviation of the sample 
values of JVi/JV (i.e., the sample percentages of cases falling in 
class 1) can be calculated from Table 30 by the formula 1 


<hv 

N 


Ivt /JVi ATs Nj\ (Ni\- 

i ~ v2j P \N’ n’ n )\n ) 


p? 


It will be recognized that the first term of the square root is 
merely the sum of the squares of the percentages of column (1), 
Table 30, multiplied by the corresponding probabilities. Each 
of these row-by-row products is given in column (2), Table 31, 
; and their sum is the sum of this column, viz., 37.8/243. Hence, 


'&Ni 

N 


/37^„| 8 lV = 

) “ \ 243 \243/ 


.21 


The same results are obtained by the use of the formulas; 
for X calculated above = i, which equals p 4 = £• Using the 
equation for standard deviation, 

/(iK#) 

5 



= ri = 2i 

\ 45 ’ 


An Illustration of a Skewed Multinomial Distribution. Another 
example will now be used to illustrate a skewed multinomial 

distribution. . 

Suppose that there are again three classes, that the probability 
of a single case falling in class 1 is pi = i, that the probability of a 
case falling in class 2 is p 2 = *, and that the probability of a 
case falling in class three is p 3 = i) again suppose that five cases 
are chosen at random. Under these conditions the probability 

AT AT J\J* g 4 

of getting jt cases in class 1, -jr cases in class 2, and -jr cases in 
class 3 is as follows: 

P (N X A/A _ 

P \jr W WJ ~ Ni\Ni\N»\ \ 2 / \ 3 / \ 6 / 

1 This is merely the short formula 




A 2 


where P 


(N i Nj, Nz\ 

\W* N ’ NJ 


f Ki 
n’ n 


X, and pi 2 - X 2 . 
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The values of P ELj f or various values of jf, and 

N $ 

~N’ are §1Ven m Table 32, and a diagrammatic representation of 
the distribution of probabilities is shown in Fig. 93. 



(Probabilities are expressed in terms of ^V^ths.) 


Since the condition ^ ^ = 1 holds for this case 

as for the preceding one, the various combinations of —S — 2 , 

N N 
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and A* are again constrained to lie in the slanting plane ABC, 

that 2, the degrees of freedom are only two. The dentation 
of probabilities is.no longer symmetrical, however. Since the 
probability of a single case falling in the first class is greater 
than the probability of a single case falling in either the second 
or the third class, the multinomial distribution shows a majoi 

bunching of cases in the direction of the ^-axis; and since the 

probability of a case falling in the second class is greater than 
the probability of a case falling in the third class, the distnbutio 

shows a minor bunching of cases in the direction of the -^--axis. 

1 All this is clearly revealed in Fig. 93. It is to be noted, how¬ 
ever, that the mean and standard deviation formulas continue to 
hold true for this skewed form. 

Accordingly, the mean ^ equals pi = h and the standard 
deviation of ^ equals : 


Pi(l 


N 




V.05 = .2236 


and similar calculations will give the means and standard devia- 

,. . , v 2 " a n » ■ 

tions of -pj and 

Use of Multinomial Distribution in Testing Hypotheses. Like 
the binomial distribution, the multinomial distribution may be 
used to test hypotheses regarding the true division of cases m a 
population. Consider, for example, the following problem: 
Suppose there are three candidates for election to the same 
office. An inquiring reporter stops five persons at random anc 
asks which candidate they favor. The results Obtained are as 
follows: 


Candidate 

A 

B 

C 


Number of Persons Favoring 
Specified Candidate 
i 3 

2 

: • o 
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Table 32 .— Multinomial Distribution Representing the Distribution 
of Sample Percentages for Which N = 5 , k = 3 and oi = 4 


= i and p 3 = | 


Type of sample 

M 

N 


Probability 
p (*} N? N *\ 
\N N'W) 

~243 

7,776 

32 

7,776 

1 

7,776 

810 

7,776 

405 

7,776 

80 

7,776 

240 

7,776 

15 

7,776 

10 

7,776 

1,080 

7,776 

270 

7,776 


.4 

80 


7,776 

.0 

720 

7,776 

.6 

90 

7,776 

.6 

40 


7,776 

.2 

1,080 


7,776 

.2 

480 

7,776 

.6 

120 

7,776 

2 

1,080 

7,776 

4 

810 

7,776 

4 

360 


7,776 
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On the assumption that the population from which this sample 
is taken is relatively large, do these results disprove the hypothe¬ 
sis that the population as a whole is equally divided with respect 
to the three candidates? That is, does the sample division of 
.6, .4, and .0 disprove the hypothesis of a true division of .66 + , 

.33 + , and .33 + ? . , 

To answer this question it is necessary, as m the twofold 
case first to decide upon a coefficient of risk of rejecting a 
hypothesis when it is really true. Let this be .10 in the present 
instance. In other words, the investigating organization is 
willing to run the risk of rejecting a true hypothesis 10 times out 

The second step is to derive the distribution of sample per¬ 
centages for samples of 5 from the assumed population. Since 
in the present instance the hypothesis is that pi - ps - P» 
the desired sampling distribution is that described by the multi¬ 
nomial distribution of Table 30 and Fig. 92. 

The third step is to select a group of unusual samples that 
may constitute a “region of rejection.” In this particular 
instance the distribution of samples is discrete and the number of 
cases is small, so that approximation by a continuous distribution 
is likely to be very inaccurate. It may accordingly be impossib e 
to find any region that has a probability exactly equal to the 
adopted coefficient of risk. In the present problem it seems 
reasonable to proceed as follows: In Fig. 94, where the plane 
ABC of Fig. 92 is laid out flat, all the more unusual samples 
(i.e., those with the lowest probabilities) are seen to lie on or 
outside of a given circle that is ruled double in the diagram. 
The total probability of this group is .135. Although the region 
consisting of the circle.and..the area outside of it would thus 
constitute a region of rejection with a probability greater than 
the adopted coefficient of risk, this probability is not much 

greater. . . . 

Let it be assumed that the investigating organization is 

indifferent to the other values of the population percentages that 
might be true, i.e., that it is willing to run the same chance of 
accepting the given hypothesis in whatever direction the true 
percentages might happen to lie relative to the assumed per¬ 
centages. Under these circumstances, a symmetrical region o 
rejection, such as the circular area just described, should be the 
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type °f region adopted. In view of this situation, therefore, it 
will be assumed that the investigating body raises its coefficient 
° S ? 1° ' 1 , 35 and adopts the g^en region as its region of rejection. 
,, ™ ® nal 8t ®P “ the analysis is to note that the given sample 
(3,2,0) does not fall m the region of rejection. The investigating 
organization consequently concludes that the hypothesis of an 


(o,o,. n' 



(Probabilities are expressed in terms of ¥ | ¥ ds.) 


equal division of sentiment regarding the three candidates cannot 
be rejected as a result of the knowledge obtained from the sample. 

. c ypothesis had been other than an equal division of 
sentiment, the multinomial distribution describing the set of all 
possible samples would have been a skewed one such as that 
represented by Table 32 and Fig. 93. In these instances, the 
selection of a symmetrical region of rejection presents difficulties, 
this problem need not be discussed here, however, for in very 
few instances are such small samples taken as that assumed in the 
given problem. When larger samples are used, it is possible to 
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make a mathematical transformation of a skewed multinomial 
distribution that makes it more symmetrical and at the same 
time permits a redescription of the distribution in terms of a new 
statistic that has a simple one-dimensional sampling distribution. 

This is discussed in the ensuing section. o 

2 (JVi - A'p,)’ 

.Yp, ’ 

One of the important contributions of mathematical statistics 
has been the demonstration that when the set of all possible 
samples from a manifold population is described in terms of the 

statistic sampling distribution is reasonably 

well described, if N is large, by the x 2 distribution. Hence, 
instead of using the multinomial distribution for testing various 
hypotheses, it is much more practical in the case of large samples 

to calculate the sample statistic ^ and use tlie 

distribution, for which tables are readily available. 

The complete argument by which this important conclusion 
is reached is not given in this section. Nevertheless, multi¬ 
nomial problems arise so frequently and the x 2 distribution is 
used as a substitute for the multinomial distribution in so many 
of these problems that it is well for the student to understand 
the basis upon which this use of the x 2 distribution rests The 
argument, therefore, by which it is established that the statistic 

X (Ni — N p ,:) 2 k ag a gam pin,g distribution of the form of the 
x 2 distribution is discussed in a simplified way m the following 

S6< 'Tsimple Version of the Argument: the Symmetrical Case. In 
Fig. 94, the plane ABC from Fig. 92 is laid out horizontally; the 
sample points are all marked with heavy dots, and the coordinates 
of the point are written beside each. In addition, the probabil¬ 
ity of each is indicated. It will be noted that the point X 
which was not shown in the original figure (Fig. 92 ) represents 
the point whose coordinates are the means of the Nt/N 
i.e., for this case, where pi = i, Pa = i, Pa “ *■ 
the point X is not a sample point, 1 . it is nevertheless important 
in that it marks the center of the distribution. 

i Tf nine oases, for example, had been chosen instead of five., the mean 
point would have also been one of the sample points. 
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It will be noted from Fig. 94 that, for the particular multi¬ 
nomial distribution represented, sample points of equal probabil¬ 
ity all lie equally distant from the mean point X. That is, 
points of equal probability lie in a circle with center at X. 
This characteristic of the distribution suggests that the set 
of all possible samples might be described in a somewhat simpler 
way, viz., by showing how the probability varies with the distance 
of sample points from the mean point X or preferably with the 
square of the distance, since the latter is easier to calculate. 

In pursuance of this line of thought, the following table of 
probabilities may be calculated from Fig. 94 and used in all 
practical problems as a substitute for the original multinomial 
distribution. The point (.2,.4,.4), for example, is distant from 

Table 33. —Pbobabilittes of Specified Values of D 2 


-D 2 = square of distance of 
sample point from mean 
point 

P(D i ) ~ probability of 
getting a sample point 
distant by D from mean 
point 

.0266 + 

90 

243 - 0 37 ° 

.1066 + 

60 

243 = °' 247 

.1866 

60 

243 - °' 247 

.2466 

30 

243 = 0 123 

.6666 

m = 0 012 

the mean point X by 1 




■ +- D’+C 4 

+•++• 


Points (.4,.2,.4) and (.4,.4,.2) are equally distant from X . The 
probability of getting a particular one of these three sample 

! The / general equation for measuring the square of the distance between 
points (xi,y h zi) and {x^y^zO is 

d 2 - (xi — XiY + Q/! - y ,)2 q. ( Zl _ s 2 ) 2 
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results is tH's, and the probability of getting any one of them, 
i.e., the probability of getting a sample point distant from the 
point by D = +3266, is 3(30)/243 - = -370. Similar 

calculations give the other entries in Table 33. A more useful 
form of Table 33 is given in Table 34. This cumulates the 
probabilities of Table 33; it thus gives the probability of getting 
a sample point that is at least as far from the mean point as the 
given sample point. 

Table 34.— Probabilities of D 2 ^ Specified Quantities 



Probability of as great or 
greater D 2 

.0266 + 

243 1 

243 

.000 

.1066 + 

163 

243 

.629 

.1866 + 

93 

243 

.382 

.2466 + 

33 

243 

.135 

.6666 + 

3 

243 

.012 


The advantage of Table 34 is the ease with which it can be 
used to test the hypothesis pi = p 2 = Pa = 4- If -135 is adopted 
as the coefficient of risk, then Table 34 indicates immediately 
that samples for which D 2 S .2466 would constitute a sym¬ 
metrical region of rejection having the probability .135. Hence, 
given any sample, it is merely necessary to calculate 



and to note whether the sample D 2 falls above or below .2466. 
There is no longer need to draw a diagram such as Fig. 92 m order 
to mark off a two-dimensional region of rejection, for the set of all 
possible samples has now been described in terms of a single 
statistic, I) 2 , whose sampling distribution (Table 34) is one¬ 
dimensional. From the way Table 34 was derived, however, it 
is known that the upper .135 tail of the distribution of D 2 iss the 
mathematical equivalent of the symmetrical. two-dimensional 
region of Fig. 92. The simpler one-dimensional description 
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of the set of all possible samples thus accomplishes the same 
result as the multidimensional description. 

The General Case . When the hypothesis gives to pi, p 2 , and 
Ps values that are not equal, the result is a skewed multinomial 
distribution. Generally, therefore, it is not possible to draw 
neat circles through points of equal probability, such as was 
done in the special case just considered; nor is it possible to 
redescribe the set of all possible samples exactly in terms of the 
distance D 2 from the point given by the mean percentages Pl , 
P2 , and p 3 . Fortunately, however, a certain procedure can be 
adopted that will give approximate results similar to those 
obtained in the simpler case. This procedure is as follows: 

In the more general case, considerable advantage is gained 
by changing the scale units. Instead of taking Nj/N, N 2 /N, 
and Ns/N as the coordinates of a sample point, it is found 
convenient to take AVV-Api, N 2 /\/Np 2 9 and N 8 /y/Nps as 
the sample coordinates. 1 This means that the scales are modi¬ 
fied m proportion to square roots of the hypothetical probabili¬ 
ties, the net effect of which is to make the distribution more 
symmetrical. The point represented by the mean percentages 
becomes, in ter ms of the new coordinates, pi \/N/ pi, p 2 \/N/ p 2 , 
and p 3 VATS, or, simply, -y/Api, VW* 9 and VWs; and the 
square of the distance between this point and any sample point 

becomes V - ViVpT or V iN.- N P<)\ 

\VApi / Zv Nfr 

Owing to the approximate symmetry resulting from this 
change in the scale units, a symmetry that improves the larger 
the value of N, it is possible to give an approximate description 
of the set of all possible samples in terms of 

= V w - N&)* 

4 Npi 

When this is done mathematically, it is found that the distribution 
of probabilities is approximately of the form of a x 2 distribution 
where n in the x 2 equation is taken equal to the number of classes 
minus 1. ? 

To illustrate this important conclusion, suppose that sampling 
is made from a population in which the various items fall into 

1 That is, the old coordinates are multiplied by \/N /p 
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three classes. Then on the assumption that the percentages 
of items in these three classe!3 are Pi, V‘h and Ps, the relative 
frequencies, or probabilities, with which various samples would 

have values of D' 2 = J equal to ° r exceeding 

certain specified values are as follows: 1 


-Probabilities 

op D ,% ^ Specified Quantities 

Probability of as Great or 
Greater Value 1 
(= Probability of as Great 

Values of 

or Greater x 2 f° r 

D ' 2 

n = 3—1=2) 

.0201 

.99 

.0404 

.98 

.103 

.95 

.211 

.90 

.446 

.80 

.713 

.70 

1.386 

.50 

2.408 

.30 

3.219 

.20 

4.605 

.10 

5.991 

.05 

7.824 

.02 

9.210 

.01 


That is, in the set of all possible samples from the given popula¬ 
tion, the quantity D n = ^ ^IVpT^" would have a value 

equal to or greater than 5.991 in 5 per cent of the samples, or a 
value equal to or exceeding 4.605 in 10 per cent of the samples, 

and so forth. . 

In the original description of the x 2 distribution, n was said 
to determine the nature of the curve 2 . Here n may be 
identified with the degrees of freedom. The reason for this m 
the present problem should now be clear. When the number of 
classes into which a discrete population is divided is. three, as 
has been assumed in the above discussion, then the quantity of 


? 


- 2 <**?*-' 

1 This is merely the row 

2 Seep. 111. 


has a sampling distribution of the form 
“n = 2” of a x 2 table (Appendix, Table VIII). 
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of the x 2 distribution with n = 3-1 = 2. It will be recalled, 
however, that when the set of all possible sample points is 
described in terms of the percentages falling into each class, the 
various possible sample percentages must conform to the 

equation ~ + = 1. Geometrically this meant that 

all the sample points had to lie on a plane (the plane ABC of 
Fig. 92, for example). It was accordingly said that there were 
only two degrees of freedom for the selection of the various 
possible samples. Since for the case of three classes, n in the x 2 
formula is equal to 2, this n becomes the same as the degrees of 
freedom in the given problem. 

This relationship also holds true for any number of classes. 
For the inevitable equation + ~ + . . . ^ = l—inev¬ 
itable because the sum of all the class percentages must be 100 per 
cent always reduces the degrees of freedom with which various 
possible samples may be selected from h to h — 1, while it is 

also always true that the quantity D' 2 = V W ~ iVpJ 2 ^ a 

distribution approximately of the form of the x 2 distribution, 
with n in the x 2 equation equal to k - 1. Hence in problems 
of this kind the n in the x 2 equation is always the same as the 
degrees of freedom. 


Estimation of Population Percentages. Zones of Confidence . 
When, as in the case of the multinomial distribution and the 

distribution of > there is more than one popula¬ 

tion parameter to be estimated, the determination of confidence 
intervals for these parameters presents difficulties. In abstract 
terms, the problem is simple enough. Thus, for any given case, 

the .05 point 

be determined from a x 2 table (all that need be known for this 


of the distribution of 


^- 2 could 


is the value of n in the x 2 formula), and the jy 

could be set equal to this value. The resulting equation would 
give the value of the population percentages pt that would just 
be on the border of reasonableness (assuming a coefficient of 
confidence = .95). 


A5 A£A£ 
N NN 
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Geometrically, the locus of the values of the p* s satisfying 
this equation would constitute a definite geometrical figure 
such as an ellipse or, when there are more than three parameters, 
some sort of an ellipsoid. All values lying outside this locus 
would be considered unreasonable values, and all inside would be 
considered reasonable values. The enclosed area would consti 
tute a “zone of confidence,” and in repeated sampling it might 
be said that this zone would include the true value 95 per cent 
of the time. 

The difficulty that arises in trying to make use ol the abstract 
analysis above is that the locus 
marking the zone of confidence is not 
a simple ellipse or ellipsoid but a 
much more complicated figure. In 
fact, practical determination of the 
values of pi constituting that locus 
would involve so much trouble that 
it is almost never undertaken. In 
general, the investigator is content ^ 
to test particular hypotheses that 
may be of importance rather than 

attempting to class all hypotheses into those that are reasonable 

and those that are unreasonable. 

Maximum-likelihood. Although determination of zones ot 
confidence for the values of the population percentages is thus 
not usually undertaken, single maximum-likelihood estimates 
may be readily made. For it is found that the values of Pi, Ps. 
p* etc., that maximize the probability of a given sample result 
are but the sample percentages Ni/N, Ni/N, Ni/N, etc. 

1 This may be demonstrated as follows: If there are three class divisions, 
the probability of a given sample point is 


I 



f N N / 






By the method of maximum likelihood; the values of pi, Pi. and p 3 are to be 
chosen so that the logarithm of this probability is a maximum. The per¬ 
centages P, p„ and p, are not independent, however, for they must add up to 
■■ ■ T . i _ I n - i In the process of estimation, there¬ 

fore Tet pTand pi be thebidependent estimates to be made and let p. be 
determined from p, and p, Under these conditions the maximizing values 
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Maximum-likelihood estimates of the population percentages 
are thus a very simple matter. 

APPLICATION OF THEORY TO SELECTED PROBLEMS 
^ Sampling Public Opinion. In sampling public opinion, the 
division of opinion is often threefold or more. In all such 
instances the foregoing theory becomes immediately applicable. 
Consider, for example, the following problem: 

In 1940 the election returns were as follows: 


F. D. Roosevelt, Democratic candidate 
Wendell Willkie, Republican candidate 
Other candidates. 


Per Cent 
. 54.7 
. 44.8 

. 0.5 


Suppose that in the fall of 1944 a given polling agency took a 
random sample of 1,000 voters* and found that 510 were for 
Roosevelt, again the Democratic candidate, 478 were for Dewey, 


of 

of 


pi, P 2 , and p 3 must be such that the partial derivatives of the logarithm 
the probability with respect to Pl and p 2 must both be equal to zero 
the logarithm' of the above probability is 




+ N x log Pi + N 2 log p 2 + N s log p 3 


° f thiS With reSpect t0 Pl and P2 > ^ bein g ^mem- 
bered that dp 3 / dpi = dp 3 /dp 2 = -1, are 


Ni 

Pi 

N 2 

P2 


iv 3 


P3 

Nt 


P3 


or iV lP3 = Nzpi and N 2 p 3 = iV 3 p 2 . 

. N ° W add these two equations, and to each side of the total add N s 
eivinsr ’ ^J^ 3 * 


(Ni + N 2 + N 3 )p 3 = Wa(pi + p 2 + ps) 


Since Nx + N 2 + N s = N, and pi + p 2 + p ; 
to denote an estimate) 


1, this gives (using a breve 



Then from the two previous equations, it is found that = 1V,/N and 

hood esUmJte TtT’ “ St ?! d “ ‘ he b ° dy ° f the text ’ the maximum likeli¬ 
hood estimates of the population percentages are the sample percentages. 

o avoid complications, a simple random sample is assumed to be taken 
Actually, representative random sampling is employed by most polling 
agencies. See pp, 183-185, ' p g 
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the Republican candidate, and 12 were for other candidates. 
Could it have been reasonably inferred from this sample that 
popular sentiment regarding the various parties had changed? 
In other words, would the hypothesis of 54.7 per cent Democratic, 
44.8 per cent Republican, and 0.5 per cent other affiliations have 
been a tenable hypothesis in view of the sample results? 

To answer this question the investigation proceeds as follows: 

First the statistic D' 2 — 2 — is calculated. This 

gives, in the present instance, 

tv, _ (510 - 547) 2 , (478 - 448)’ , (12 - 5) 2 „ _ 

D __ m + 153—+-=11.5 

Next it is noted that there are only three classes, so that D' 2 is 
approximately distributed as x 2 , with the degrees of freedom n 
equal to 3 — 1 = 2. Finally a region of rejection is chosen. 
If .05 is taken as the coefficient of risk, a x 2 table shows that 
for n — 2 the region consisting of values of D' 2 equal to or 
greater than 5.991 is an appropriate region of rejection to select. 1 
In the given instance the sample D' 2 = 11.5 clearly falls in the 
region of rejection. Consequently, the hypothesis that there 
has been no change in political sentiment cannot be accepted. 

Since the maximum-likelihood estimates of population per¬ 
centages are the sample percentages, 2 it follows in this case that 
the maximum-likelihood estimates to be made of the true political 
sentiment are 51 per cent Democratic, 47.8 per cent Republican, 
and 1.2 per cent other parties. 

Goodness of Fit of a Frequency Curve. The use of the x 2 

distribution in testing the goodness of fit of a frequency curve 
has already been discussed in a practical way in Chap. VII. 3 
The theoretical basis of the method there outlined will now be 
discussed. 

1 It will be recalled that if Vi/v^Vpi, Ni/-\/ : ]\J p a , etc., are taken as the 
coordinates of sample points, D' constitutes the distance from a sample 
point to the mean point vA'Vpu VNp 2 , etc. (see p. 326). Accordingly, 
D' ^ V 5.991 constitutes a circular region. This is unbiased with respect to 
Other values of pi,p2, and p 3 than those chosen by hypothesis in the sense 
that the probability of samples falling in this region is least when pi, p2, p 3 
have the given hypothetical values and increases in whatever direction 
pi, p 2 , and p 3 deviate from these hypothetical values, 

2 See p. 329. 

:| See pp. 137-152. 
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The procedure outlined previously called for the calculation 

of ^P ^ y — > where F was the frequency of cases in any 

class interval given by a sample histogram and / was the cor¬ 
responding frequency for that interval given by some theoretical 
frequency curve. It should be recognized now that this is 

merely a special form of ^ For JVi is the sample 


For Ni is the sample 


number of cases falling in any class and hence is the same as F 
in the other equation, and Npi is the number that would be 
expected to fall in that class if the sample total were divided 
in the same proportion as the population total and hence is the 
same as /. Since in the general discussion earlier in this chapter, 

^jP ' is approximately distributed in the form of a x 2 

distribution, so in the special case of a comparison of a sample 
histogram with a theoretical frequency curve the quantity 

^ ^ j, ^ is distributed in the form of a x 2 distribution. 

There is one thing new in this problem, however, and this 
has to do with the determination of the degrees of freedom, 
n. When the theoretical curve that is fitted to a sample histo¬ 
gram is a normal curve, say, the theoretical probabilities pertain¬ 
ing to each class are not given directly by the mere hypothesis of 
normality. It is first necessary to estimate the mean and 
standard deviation of the curve from the sample. The effect of 
this is to impose additional conditions on the sample frequencies 
that reduce the degrees of freedom from k (the number of classes) 
minus 1 to A; minus 3. For in all cases there is the condition 


is distributed in the form of a x 2 distribution. 


WL , Ni * , N* 
N N N 


El « i 
N 


Setting the mean and standard deviation of the curve equal to 
the mean and standard deviation of the sample histogram imposes 
the additional conditions 1 

1 It will be recalled that Ni is the same as F and that pi is the same as 
It will also be recalled that when relative frequencies ar e used the mean of a 
distribution is ^ ^ X and its standard deviation is N ^ ~ ^ 2 ' 
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2) jf X< = 2) P,:Xi 

2 f & - ^) 2 - 2 pi(Xi - 1)2 

This means that in the set of all possible samples (in this case, 
sample histograms) that might be drawn from a given popula¬ 
tion, only those samples will be considered for which the sample 
percentages Ni/N satisfy the above three equations. 

Probabilities are thus calculated with respect to this special 
group of the set of all possible samples, and it is in this sense 
that the decrees of freedom are reduced. In general, when a 
frequency curve is fitted to a sample histogram and c parameters 
of the curve are estimated from the sample histogram, c + 1 
conditions are imposed upon the sample percentages and the part 
of the set of all possible samples that is considered has k - c - 1 

S (Ni - 2Vp t -) 2 




or its 


VIZ. 


equivalent in terms of the frequency distribution analysis. 

^ ~ /) - ! , has a sampling distribution of the form of the x 2 

distribution with the degrees of freedom equal to k - c - 1. 
This is the explanation of the special value given to n in the 
procedure described in Chap. VII. 

Testing Independence. Still another use of the multinomial 
distribution and the x 2 distribution based upon it is. to test 
whether one classification is independent of another classification 
or, conversely, whether the two are associated. Particular 
measures of association or correlation were discussed in Chap. 
XII, and tests of significance were developed for these measures. 
At this point, attention will be directed primarily to testing the 
existence of an association, whatever its form or degree. 

Consider a simple case. In general, the percentage of males 
and females in a large population tends to be the same, no 
matter what the color of the people. That is, the sex ratio is 
independent of color. Within a limited area, however, this 
may not hold true. Economic and social forces in certain 
metropolitan districts, for example, may cause the percentage of 
black and other colored males to exceed considerably the per¬ 
centage of white males, and vice versa for females. Suppose 


go 
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sample of 1,000 people is taken from a given city with a view to 
studying the relationship between sex and color. For this 
purpose the 1,000 people are “cross classified” as follows: 


Table 36. 1,000 People Classified as to Color and Sex 



Males 

Females 

1'otals 

Whites. 

380 
im 

320 


Blacks. 

onn 

Other colors. 

J- 

70 

ou 

30 

zuu 

100 

Totals. 

600 

400 

1 OOO 


TUU 

1 , UUU 


The question is: Does this sample of 1,000 people demonstrate 
definitely that the population of the city as a whole has a higher 
percentage of black and other-color males than white males, or 
can this sample result reasonably be considered the effect of 
chance? Or, to put it another way, does this sample show 
conclusively that the sex ratio in this city is not independent of 
color? 

In a problem of this kind, it is more convenient to test the 
negative, or “null,” hypothesis. For if the classifications are 
assumed to be independent within the population as a whole, it 
is then possible to tell something about the expected distribution 
of the cases. Thus, if two classifications are independent, the 
distribution of cases in the various categories of one classification 
may be expected to be the same for each of the categories of the 
other classification. On the other hand, if the positive hypothe¬ 
sis of correlation is set up, nothing can be told about the expected 
distribution of cases unless the exact form of correlation is 
specified. Thus, in the absence of specific information regarding 
the nature of the correlation, it is customary to test the null 
hypothesis of independence. 

With reference to the illustration, the assumption of inde¬ 
pendence means that the percentage of males (and hence the 
percentage of females) in the population as a whole is the same 
for all color groups and the percentages of whites and blacks 
(and hence the percentage of other colored people) are the same 
for both sexes. That is, the probability of picking a male at 
random (and hence the probability of picking a female) is 
independent of the probability of picking a white person at 
random, the probability of picking a black person at random, and 
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the probability of picking a person of another color at random 
Likewise, both the probability of picking a white person at 
landom and the probability of picking a person of another color 
are independent of the probability of picking a male at random 
and the probability of picking a female at random. Under these 
conditions, the probability of picking a white male, for example 
is, according to the multiplication theorem for independent 
attributes, the product of the probability of picking a male and 
the probability of picking a white person. Or, again the 
probability of picking a black female is simply the product 
of the probability of picking a female and the probability of 
picking a black person. 

Symbolically independence may be described as follows: 1 
If one classification is represented by the roman numerals I and 
II and the other by the arabic numbers 1, 2, and 3, then if the 
j two classifications are independent, p r (and hence p n , which 
:equals 1 - Pl ) is independent of p„ p 2 , and p 3 ; and Pl and p 2 
j(and hence p 3 , which equals 1 - pi - p 2 ) are independent of p, 


' rABLE 37 .—Symbolic Illustration of Independence 



I 

II 


1 

o 

Pii 

Pill 

Pi 

o 

Pi 2 

Pll 2 

p2 

3 

PU 

Pn 3 

P3 


Pi 

Pii 

1 


and p„. Hence, Pll = p lPl ; p Is = PlP! ; Plj = piPj . 
Pn 2 = pnp 2 ; and pn 3 = p n p s . 


Pm = PnPO 


Now it is to be noted that the assumption of independence 
does not give the actual values of any of the above probabilities 
but merely states the existence of certain relationships among 
them. 2 Consequently, if the actual frequencies to be expected 
on the basis of the assumption of independence are to be found, 
it is necessary to estimate some of the underlying probabilities 
from the sample data (just as the mean and standard deviation 
of the population were estimated in the previous example). 


1 Reference to Table 37 may help the reader at this point. 

3 Just as knowledge that a population is normally distributed tells nothing 
about the actual distribution of probabilities but gives only its general form. 
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Thus, for the problem in hand, the probability of picking a male 
at random (pi) can be estimated as equal to the percentage of 
males in the sample (600/1,000 = 60 per cent), the probability of 
picking a white person at random (pi) can be estimated as equal to 
the percentage of white persons in the sample (700/1,000 = 70 
per cent), and the probability of picking a black person at random 
(p 2 ) can be estimated as equal to the percentage of black persons 
in the sample (200/1,000 = 20 per cent). Symbolically, these 
“estimating equations” are 

N u + N u + N lt = . 

N ^ 

Nh + Nih _ , 

N p 

N *+ Nl1 * = p 2 

N p 

or 

N u + Ni, + N i. = Npi 
N u + Niu = Npi 
N u + Nu, = Np 2 

where the N«’a refer to the sample frequencies in the various, 
classes, N is the total number of cases, and the p’s are the esti¬ 
mated probabilities. 

Estimates of the other class probabilities 1 can be computed 
from these three probabilities with the help of the relationships 
given by the hypothesis of independence and the laws for addition, 

and multiplication of probabilities. 

Thus 

^ = = 100 per cent - 60 per cent = 40 per cent; 

= i _ A, — i )2 = 100 per cent — 70 per cent — 20 per cent 

V ‘ P F =10 per cent; 

^ = fafa = (60 per cent)(70 per cent) = 42 per cent; 

Pu = P 1 P 2 = (60 per cent)(20 per cent) = 12 per cent; 

Pi, = P 1 P 3 = (60 per cent)(10 per cent) = 6 per cent; 

ij ' = p u p! = (40 per cent)(70 per cent) = 28 per cent; 

piu = pup* = (40 per cent)(20 per cent) = 14 per cent; 

an( l „ . . 

pn, = pnfz = (40 per cent) (10 per cent) = 4 per cent. 

1 It makes no difference what class probabilities are selected as the original 

three to be estimated directly from the sample data. 
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Applied to the number in the sample (N = 1,000), the fore¬ 
going estimated probabilities yield the following theoretical, or 
expected, frequencies for the various classes: 


Table 38.-1,000 People Classified as to Color and Sex 


| : "/■ . " : ' ' ' I 

Males 

Females 

Totals 


420 

280 

700 


120 

80 

200 


60 

40 

100 


600 

400 

1,000 




.• -V, 


The actual frequencies differ cpnsiderably from these expected 
frequencies, and the question arises:. Are these differences 
sufficiently great to disprove the hypothesis of independence? 

This is a problem involving estimated probabilities Like 
the previous problems, however, it can be shown mathematically 

that the quantity ^ ^ has a sam P lin S distribution 

that is approximately 1 of the form of the x 2 distribution with the 
degrees of freedom n equal to k - c - 1, k being the number of 
classes and c the additional restrictions imposed by the process 
of estimating the underlying probabilities. The number of these 
conditions is always equal to the number of the class probabilities 
originally estimated from the sample data (i.e., the number of 
estimating equations). Another way of finding the degrees of 
freedom is to note the number of cells in the cross classification, 
or ‘‘ contingency ,” table (such as Table 36) that can be filled in 
arbitrarily without changing the marginal totals. Since one 
cell must be left in each row and one in each column in order to 
make the frequencies in each row or column add up to the given 
marginal totals, it follows that a table of r rows and c columns 
will yield (r — l)(c — 1) degrees of freedom. 

The problem in hand, then, may readily be solved as follows: 
From Tables 36 and 38, it is found that 

1 Not only is the x 2 distribution derived from the multinomial distribution 
W approximation, but it must be recalled that the derivation of the multi- 
lomial distribution itself is based on the assumption of sampling with 
replacements; when this is not true in practice, as is usually the case, the 
nultinomial distribution gives merely approximate probabilities that are 
sufficiently accurate only for samples that are small relative to the 
population. 
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% 


s 



- NyaY = (380 - 420) 2 (150 - 120) 2 , (70 - 60) 2 

Npa 420 ^ 120 + 60 


I (320 - 280) 2 (50 - 80) 2 (30 - 40) 2 

280 ^ 80 ^ 40 


32.44 


Examining the x 2 table for n = 6 - 4 = 2, it is seen that 
X 2 = 32.44 lies far beyond the .05 point. Hence, on the assump¬ 
tion that the upper .05 tail of the distribution is taken as the 
region of rejection, the hypothesis of independence cannot be 
accepted in this case; and it may be concluded that there is some 
association between color and sex in the given city. The general 
nature of that association is indicated by the sample. This sug¬ 
gests that the percentage of black and other-color males is greater 
than the percentage of white males, or, to put it another way, 
that the percentage of white females is greater than the per¬ 
centage of black females or the percentage of other-color females. 

Testing Homogeneity. Tests of independence that are used 
to determine the homogeneity of a sample are of sufficient 
importance to warrant special mention. Suppose, for example, 
that intelligence tests are given to 1,000 students in a coeduca¬ 
tional university and also to 1,000 students in a university for 
men students only. Suppose that the distribution of grades in 
university A is quite different from that of university B, the 
latter showing in general a tendency toward higher grades. 
Because of this result, authorities in B infer that they are getting 
a more intelligent type of student than university A. Authori¬ 
ties in A,, however, claim that the comparison is not fair, since, 
according to their contention, the women students in A are not 
generally as intelligent as the men students and the sample from 
A is consequently not a homogeneous one. Fortunately, it is 
possible to test the claim of the A authorities by applying the 
test of independence. 

Suppose that the 1,000 grades from university A are cross 
classified with respect to intelligence grade and sex, with the 
results as shown in the table on page 339. 

The question to be answered by these data is: Is the distribu¬ 
tion of intelligence grades in university A independent of sex, 
or does sex make a difference? That is, can the distributions 
of men’s and women’s grades reasonably be taken to be two 
samples from a single homogeneous population? If they can, 
then authorities in A are probably wrong; at least the test of 
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Table 39.—1,000 Students Classified as to Sex and Intelligence 

Grades 


Grades 

Men 

Women 

Totals 

91-100 

26 

27 

53 

81-90 

84 

73 

157 

71-80 

120 

108 

228 

61-70 

131 

125 

256 

51-60 

95 

95 

190 

41-50 

48 

46 

94 

31-40 

12 

.10 : , 

; , 22 

Totals 

516 

484 

. "1,000 


independence does not support their claim. If the distributions 
of men’s and women’s grades cannot reasonably be taken to be 
two samples from a single homogeneous population, then the 
authorities in A are right and the combination of the men’s and 
women’s grades produces a heterogeneous group that is not 
comparable with the presumably homogeneous group of grades 
from university B. 

What do the data show? To answer this question would 
require a repetition of the sort of numerical work carried out in 
the previous example. The reader is therefore left to find out 
the answer for himself. He will find it a good exercise. 1 

Caution: This chapter may well be concluded with a note 
of warning. It is always to be remembered that, when a hypoth¬ 
esis is not disproved , it is not necessarily proved . The x 2 test of 
goodness of fit affords a good illustration of this. As already 
indicated, if a normal curve gives a good fit to a sample histo¬ 
gram, it is no proof that the sample came from a normal popula¬ 
tion. Any other curve that would give approximately the same 
theoretical frequencies as the normal curve would be an equally 
tenable hypothesis. Furthermore, it is to be noted that the x 2 
test takes no account of sign. A histogram may differ from a 
cuive in a negative manner on one side of a central point and 
in a positive manner on the other. Still, if the differences are 
small, the x 2 test may not reveal this obvious lack of conformity’ 
If a hypothesis is not disproved by one test, it may thus be dis¬ 
proved by another. It is therefore well to examine a hypothesis 
from all angles before accepting it, even tentatively. 

1 The procedure is to set up the hypothesis of independence and from this 
and from the marginal totals to calculate a set of theoretical frequencies 
that may be compared with the actual frequencies by the x 2 test. 





CHAPTER XIV 


JOINT SAMPLING FLUCTUATIONS 
IN MEAN AND STANDARD DEVIATION 

Previous sampling analysis has concentrated attention on a 
single statistic and a single population parameter. Hypotheses 
regarding that parameter were tested and confidence limits 
established. Sometimes, however, problems arise in which two 
or more parameters are involved. For example, the question 
might be asked whether a given sample could have come from a 
normal population with a specified mean and a specified standard 
deviation. 1 A problem might also require joint confidence limits 
for the mean and standard deviation of a normal distribution. 
It is with these problems that the present chapter will be con¬ 
cerned. Similar questions could be asked with respect to normal 
bivariate or multivariate populations or to nonnormal popula¬ 
tions in general, but these more difficult problems will not be 
discussed here. 

DERIVATION OF JOINT SAMPLING DISTRIBUTION OF MEAN 
AND STANDARD DEVIATION 

To answer questions about both the mean and the standard 
deviation of a normal population it is necessary to know the joint 
sampling distribution of the sample mean and standard deviation. 
This section will discuss the derivation of such a joint distribution. 

1 Previous analyses of sampling fluctuations in means and standard devia¬ 
tions were concerned with the following questions: (1) Given the standard 
deviation of the population as a known quantity, did the sample in hand 
come from a population in which the mean had some specified value? To 
answer this question the normal curve was used. (2) Regardless of what 
the standard deviation of the population might be, did the sample come from 
some population in which the mean has a specified value? Here the t dis¬ 
tribution was used. (3) Regardless of what the mean of the population 
might be, did the given sample come from some population in which the 
standard deviation has some specified value? The answer to this question 
required the use of the x 2 distribution. 

In the present problem both the mean and the standard deviation are 
specified by hypothesis. 


340 
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Numerical Illustration. For samples of 2 from a normal 
population, with X = 100 and. d = 10, the joint distribution 
of the sample mean and sample standard deviation may readily 
be found from Fig. 93 (page 318). To find those samples, for 
example, whose means lie between 95 and 97 and whose standard 



494.5 (50(494.5) (J) (421.7) (j) (475.2) 0H557.4) 

loo^oo + 100 , 000 ' + 100,000 "*■ 100,000 T 100,000 


981.7 

100,000 


The sum of probabilities in cell a'b’c'cV is the same. Hence the sum in both 

toemthpr ic twice G r • (The probabilities in the figure are 

together is twico 100 000 100,000 \ 

expressed in —^ ths ) 


deviations lie between 2 and 4 (variances between 4 and 16), it 
is merely necessary to find the samples that lie between lines 
AB and A'B f and also between the lines GII and EF and the 
lines G'ir and E'F'. The procedure is illustrated in Fig. 96. 
In this figure the samples sought are those lying in squares abed 
and aVc'd'. The probability of these samples, as given by 
Fig. 93, is the probability of a sample with a mean between 95 
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and 97 and a standard deviation between 2 and 4. This prob¬ 
ability can be entered in a joint distribution table such as is 
pictured in Fig. 97. When this has been done for all possible 
combinations of values for the mean and standard deviation, the 
result is that shown in Fig. 98. Figure 98 depicts the joint 
distribution of the mean and standard deviation of samples of 2. 

or larger samples the joint distribution of the mean and stand¬ 
ard deviation looks more like Fig. 99. 

A study of Fig. 98 will reveal that the distribution of sample 
standard deviations is the same in form, whatever the mean 



Fig. 97. The probability of samples with a mean between 95 and 97 and a 

standard deviation between 2 and 4. (The probability in the figure is in 
1 \ V 


100,000 7 

value of the sample. For samples with means lying between 115 
and 117, for example, the probability of a sample having a stand- 
ard deviation lying between 0 and 2 is equal to 198.6/100 000 
the probability of a sample having a standard deviation lying 
between 2 and 4 is 183.6/100,000, the probability of a sample hav¬ 
ing a mean between 115 and 117 and a standard deviation lying 
between 4 and 6 is 156.8/100,000, etc. For samples with means 
lying between 117 and 119, the probabilities of various standard 
deviations are 101.2/100,000, 93.5/100,000, 79.9/100,000, etc. 
These are proportional to the former probabilities. 

. That is, 198.6/101.2 = 183.6^3.5 = 156.8/79.9 = etc., which 
indicates that the two distributions of sample standard devia- 
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tions are the same in form. This is true for all columns of Fig. 98, 
indicating that the distribution of sample standard deviations is 
independent of the value of the sample mean. This is true in 
general. For a normal population the sampling distribution of 
sample standard deviations is thus independent of the value 
of the sample mean, whatever the size of the sample. 1 


26 
24 
22 
20 
18 
16 

14 

a 

12 
10 
8 
6 
4 

2 
0 

Fig. 98.—Probability set of sample means and sample standard deviations. 

2(P) = 100,000. N = 2, n = 1. 

The sample standard deviation being independent of the 
sample mean, the joint probability of any given sample mean and 
any given sample standard deviation is just the product of their 
two individual probabilities. The equation for this joint dis¬ 
tribution is therefore the product of Eqs. (1) and (3) of Chap¬ 
ter X. Accordingly, the joint distribution of the mean and 
standard deviation is given by 

1 See Appendix to Chap. X, p. 263, 
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dP(X,cr) 


N - 1 

X 2 ( T ^- 2 




exp 


(^ 8 >- 

~-(X - X) 2 


2d| 


exp 


2d 2 


dX da (1) 


USE OF JOINT SAMPLING DISTRIBUTION OF MEAN AND 
STANDARD DEVIATION 

Testing Hypotheses. The use of the joint sampling distribu¬ 
tion of mean and standard deviation for testing hypotheses may 
be illustrated with reference to Fig. 99, which gives the joint 
distribution of the mean and standard deviation for random 
samples of 11 from a normal population in which the mean is 
equal to 100 and the standard deviation is equal to 10. The 
question to be considered will be this: If a certain sample of 
11 cases has a mean of 95 and a standard deviation of 13, is it 
reasonable to suppose that it came from a population whose 
mean is 100 and whose standard deviation is 10? 

Choice of Region of Rejection . In answering this question 
the first step is to adopt a coefficient of risk that will determine 
the risk of rejecting the hypothesis when it is true. Let this be 
set at .05. The second step is to choose an appropriate region 
of rejection. This requires careful study. 

Suggested Regions. Various regions of rejection suggest them¬ 
selves. First are regions of rejection based entirely on the 
sample mean. Figure 100 shows a balanced region of this kind 
(call this region la), and Fig. 101 shows a region covering only 
the lower mean values (call this region 16). A third region of 
this sort might cover only higher mean values but is not here 
depicted by a figure (call this region Ic). 

A similar set of regions could be based entirely on the sample 
standard deviations. Figure 102 shows a balanced region of 
this kind (call this region Ila), and Fig. 103 shows a region 
covering only larger standard deviation values (call this region 
II6). 1 A third region (not shown graphically) might cover only 
lower values of <r (call this region lie). 

1 For the standard deviation the .02 and .98 points of the x 2 distribution 
are used because the table of x 2 is so set up that these values, and not the 
.025 points, can be obtained; the total probability for the regions of Fig. 102 
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A third set of regions could be based on the statistic 
, _ VN (x - X) 


which involves both the sample mean and the sample standard 
deviation. 1 To note how such regions are set up, consider the 
following relationships: In Fig. 99, X - X is the distance of the 
sample from the vertical line through the population mean 


value X — X. Likewise, of is equal to <r 


j\TZrj an d 18 thus 


equal to ^ multiplied by the distance of the sample 

from the horizontal axis, i.e., the X-axis. 

The statistic — is thus proportional to the cotan¬ 
gent of the angle that the line connecting the sample point with 
the point (X,0) makes with the horizontal axis (angle /3 of 
Fig. 104). 

A balanced region of rejection based on the sample value of 


Vx (x - x) 


would thus look like that marked off in 


Fig. 104 (call this region Ilia). A similar region based solely 
on negative values of t is shown in Fig. 105 (call this region III6). 
A third region could be based solely on positive values of t 
but is not here depicted by a figure (call this region IIIc). 

A fourth type of region that is considered (call it region 
IV) is a region based on a special set of contours known as 
“X contoursV 2 

The quantity X is called the “likelihood ratio.” The likeli¬ 
hood of a sample, it will be recalled, is the probability of that 
sample on the basis of certain assumed values of the population 

is .04 and not .05. While the regions of rejection based on the standard 
deviation are thus not strictly comparable with those based on the mean, 
their position is good enough to illustrate the idea of regions of rejection. 

1 & is the optimum estimate of d, based on the sample standard deviation 

and is equal to <r V 'X~~T 

2 Cf . Neyman, J., and E. S. Pearson, “On the Use and Interpretation of 
Certain Test Criteria for Purposes of Statistical Inference,” Biometrika, 
Vol 20A (1928), pp. 175-240. 
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parameters. The likelihood ratio that Profs. Neyman and 
Pearson suggest is the ratio of the likelihood of the given sample 
on the assumption that the popidation has the mean and standard 
deviation given by the hypothesis in question to the likelihood 
of the given sample on the assumption that the population mean 
and standard deviation are equal to the joint maximum-likelihood 
estimates of these parameters. This ratio is given by 


X . <*»+*-« 


log 10 X = ~ [logi„ »S 2 - (M 2 + S 2 - 1)](.4343) (3) 


where M — -> S — and .4343 = logi 0 e. 

o o 

Equation (3) may be written more succinctly as follows: 

logic x = | (.4343 - k) (4) 

in which Jc = .4343 (M- 2 + S 2 ) — logio S 2 (5) 

This is the equation for the X contours. 


Table 40.— Illustrating the Calculations Necessary for 
Graphing A-contour Region of Rejection 
(Region IV) 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 


cr 

0-2 

log cr 2 

log cr 2 
- 1.3050 

(4) 

.004343 

(5) - cr2 

Vie) 

X = 100 ± (7) 

5.5 

30.25 

1.4807 

.1757 

40.46 

10.21 

+ 3.3 

103.2 

96.8 

6.0 

36.00 

1.5563 

.2513 

57.86 

21.86 

±4.7 

104.7 

95.3 

7.0 

49.00 

1.6902 

.3852 

88.69 

39.69 

±6.3 

106.3 

93.7 

8.0 

64.00 

1.8062 

.5012 

1J5.40 

51.40 

±7.2 

107.2 

92.8 

9.0 ; 

81.00 

1.9085 

.6035 

138.96 

57.96 

±7.6 

107.6 

92.4 

10.0 : 

100.00 

2.0000 

.6950 

160.03 

60.03 

±7.8 

107.8 

92.2 

11.0 

121.00 

2.0828 

.7778 

179.09 

58.09 

±7.6 

107.6 

92.4 

12.0 

144.00 

2.1584 

.8534 

196.50 

52.50 

±7.2 

107.2 

92.8 

13.0 ! 

169.00 

2.2279 

.9229 

212.50 

43.50 

±6.6 

106.6 

93.4 

14.0 

196.00 

2.2923 

.9873 

227.33 

31.33 

±5.6 

105.6 

94.4 

15.0 

225.00 

2.3522 

1.0472 

241.12 

16.12 

±4.0 

104.0 

96.0 

15.5 

240.25 

2.3807 

1.0757 

247.68 

7.43 

±2.7 

102.7 

97.3 


1 
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The calculations shown in Table 40 are solutions of an equa¬ 
tion derived from Eq. (5). By interpolating in Table 41 
(page 361) for the value of k, for which the P* is .05, k is found 
to be .695. Hence, by Eq. (5), if d equals 10 and X equals 100, 


.695 


.4343 


(X - 100) : 


+ 


log <7 2 + log 100 


100 1 100 . 
and since (X — 100) 2 — x 2 

.695 = .004343# 2 + .004343(7 2 - log a 2 + 2 


v 2 + 


.004343 


(log (7 2 - 1.305) 


X = 100 




1.305 


004343 


This last is the form in which solutions are found for X cor¬ 
responding to the specified values for a shown in column (1) 
of Table 40. The corresponding values for X are shown in the 
two subcolumns of column (8). The graph of columns (1) and 
(8) gives the elliptical region depicted in Fig. 106. This is a 
picture of region IV, which is a X-contour region. 

A final set of regions that may be suggested is a type of region 
that cuts off a comer section of the distribution of sample means 
and standard deviations. Such regions are pictured in Figs. 107 
to 109. These will be called, as a class, regions Y a, Yb, and Yc. 

Discussion of the Various Regions of Rejection. The reason 
why so many regions of rejection are suggested as possible alterna¬ 
tives is that the best region to choose in any particular instance 
depends on the circumstances of the problem. The regions 
suggested here include those likely, under varying circumstances, 
to be found most useful. Their relative advantages and dis¬ 
advantages may now briefly be considered. 

First note that all regions having a probability of .05 would 
lead to a rejection of the hypothesis 5 times out of 100 when it 
was true. Region I has the disadvantage that it would permit 
acceptance of hypotheses in cases in which the sample was very 
improbable because the a 2 ’s were very large or very small. 
Similarly, region II would have the disadvantage of leading to 
the acceptance of hypotheses in cases in which the mean was far 
distant from the hypothetical mean of the population. Region 
III, based on both sample means and sample standard deviations 
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might presumably be thought to avoid this disadvantage of 
regions I and II of including improbable samples; but unfortu¬ 
nately this does not follow. As may be seen from Fig. 104, 
region III accepts the hypothesis in cases where the standard 
deviation is very large because the mean happens to be very 
small or very large at the same time, although both are very 
unlikely on the basis of a given hypothesis. Also, region III 
accepts the hypothesis when the sample standard deviation is 
very small because the mean is also very close to the hypo¬ 
thetical value. 

Region IY avoids the disadvantage of accepting highly 
improbable hypotheses in that it has a certain elliptical sym¬ 
metry about the center of the distribution of sample means and 
variances. 

All these regions, it should be repeated, will lead to the rejec¬ 
tion of the hypothesis when it is true 5 per cent of the trials; but 
only regions like IV will lead to rejection of the hypothesis in 
every case in which the sample itself is very improbable on the 
basis of that hypothesis. 

More important, however, than the criterion that a region 
should include the more improbable samples is the criterion 
that a region should lead to rejection of the hypothesis most 
frequently when the hypothesis is not true. Which region fares 
best on this more important criterion will depend on the nature 
of the problem in hand. If the investigator desires to avoid 
acceptance of the hypothesis especially when the true value 
of the mean is less than the hypothetical mean being tested, 
region 16 should be used, for this will lead to rejection of the 
hypothesis more often when the true mean is less than the 
hypothetical mean. 1 Region III6 is also a good region to use 
in this case, but it does not appear to be as good as region 16, 
based only on the means. 

On the other hand, if the investigator wishes to avoid accept¬ 
ance of the hypothesis especially when the true value of d 
is greater than the hypothetical value, region 116 should be used; 
for when d is increased, the distribution is stretched upward and 
its vertical mean moves higher. In this instance, region Ilia, 
for certain values of d, and region lie, for all high values, 

1 For as the distribution is shifted to the left by decreasing X, a larger 
percentage of the samples tends to fall in the established region of reject ion. 
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are worse than useless, since if the true <5 were greater than 
the hypothetical d the probability of a sample falling in the 
region of rejection would be less than if the hypothetical d 
were the true d. Thus, if region lie is used under these circum¬ 
stances, there is more chance of accepting the hypothesis when 
it is not true (because d is actually larger) than there would be 
if it were true. On the other hand, region Ilia, for certain values 
of d, or region lie, for all lower values, is not so bad in cases in 
which the true d is less than the hypothetical d. 

Instances in which region Ilia or III6 appear to be particularly 
useful are those in which the investigator wishes especially 
to reject the hypothesis when in fact the true mean is less or 
greater than the hypothetical mean and at the same time the 
true d is less than the hypothetical d. For in this instance the 
distribution is distorted so that more of the samples would fall 
in the established region of rejection. 

Region IV is a compromise region that tends to give a high 
probability of rejection in whatever manner the true means and 
standard deviations differ from their hypothetical values. For 
such instances Region IV has been offered as an especially good 
region. 

The region of rejection that would be best in most instances 
would probably be one that maximized the probability of reject¬ 
ing the given hypothesis when in fact the true mean was less 
and the true d greater than the hypothetical values. A manu¬ 
facturer of tires, for example, would be concerned about getting 
tires that rendered less mileage on the average than desired and 
varied more from tire to tire. Of course, other cases would arise 
in which the investigator would wish to avoid acceptance of the 
hypothesis when in fact the mean was greater than the hypo¬ 
thetical value and the standard deviation was greater or less. 
It is believed, however, that the former instance is more common. 

A region bounded by a smooth continuous curve would appear 
to be the best type of region to adopt for the reason explained 
in the preceding paragraph. Such a region is depicted in 
Fig. 107. But to find a curve that would permit the ready 
calculation of probabilities is not so easy. Probably the simplest 
region would be a rectangular one such as that pictured in 
Fig. 108. A method of defining such a region in practical cases 
will be discussed below. A region like that pictured in Fig. 109 
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would involve the same difficulties as one like that shown m 
Fig. 107. 


Table 41. —Probability op a Valve of X as Great as, or Greater than 
Specified Value for Various Values of k = .4343 -log X* 


N 

== 3 

. ... N = 

4 

V = 

H 

5 

k 

P\ 

k 

P\ 

k 

P\ 

1 50 

.0830 

1.25 

.0578 

1.05 

.0568 

1.80 

.0415 

1.30 

.0486 

1.10 

.0450 

2 40 

.0106 

1.50 

.0243 

1.40 

.0113 

2.70 

.0053 

1.80 

. 0086 

1.45 

.0090 

N 

= 6 

N — 7 

N 

= 8 ' " 

k 

P\ 

k 

P\ 

k 

P\ 

.90 

.0666 

.85 

.0551 

.80 

.0512 

.95 

.0499 

.90 

, .0389 

.85 

.0341 

1.20 

.0117 

1.05 

.0137 

1.00 

.0101 

1.25 

.0088 

1.10 

.0097 

1.05 

.0068 

N = 9 

N = 10 

N -- 

= n 

k 

P\ 

k 

P\ 

k 

ex 

.75 

.0534 

.70 

.0625 

.65 

.0822 

.80 

.0336 

.75 

.0371 

.70 

.0461 

.90 

.0133 

.85 

.0131 

.80 

.0145 

.95 

.0084 

.90 

.0078 

j .85 

.0081 


* Abridged from more elaborate tables in J. Ncyman and E. S. Pearson, “On the Use and 
Interpretation of Certain Test Criteria for Purposes of Statistical Inference,” Biometnka, 
Yol. 20A (1928), pp. 175-240, 238-240. 


Illustrations. The two regions whose use will be illustrated 
here will be region IV, based on the X contours, and region Yb. 
These regions are the ones that would probably be used in most 
instances when a joint hypothesis was being tested. Region TV 
would be used when the investigator is indifferent to what the 
actual values of X and d might be and region Yb would be used 
when the investigator is especially troubled about these values 
deviating from the hypothetical values in a particular direction. 
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Regions based entirely on either the mean or the standard devia¬ 
tion alone would probably not be used in cases in which a joint 

hyP ?^bf S1S 1S bemg tested> is unlikely that the investigator 
would be at all concerned about the actual value of the standard 
deviation, say, when the hypothesis does involve this quantity. 


Table 42 . —Values of the Ratio 


k 

p x 

X 

k 

.435 

1.0008 

.520 

.440 

1.0062 

.525 

.445 

1.0116 

.530 

.450 

1.0170 

.535 

.455 

1.0224 

.540 

.460 

1.0279 

.545 

.465 

1.0334 

.550 

.470 

1.0390 

.555 

.475 

1.0446 

.560 

.480 

1.0502 

.565 

.485 

1.0559 

.570 

.490 

1.0616 

.575 

.495 

1.0673 

.580 

.500 

1.0731 

.585 

.505 

1 .0789 

.590 

.510 

1.0847 

.595 

.515 

1.0906 

.600 


P\/X for Various Values of k* 


p x 

X 

k 

X 

1.0965 

. 605 

1.2024 

1.1024 

.610 

1.2089 

1.1084 

.615 

1.2155 

1.1144 

.620 

1.2221 

1.1205 

! .625 

1.2287 

1.1266 

.630 

1.2353 

1.1328 

.635 

1.2420 

1.1390 

.640 

1.2488 

1.1452 

.645 

1.2555 

1.1514 

.650 

1.2623 

1.1576 

.700 

1.332 

1.1639 

.750 

1.407 

1.1702 

.800 

1.485 

1.1766 

.850 

1.569 

1.1830 

.900 

1.658 

1.1894 ' 

.950 

1.753 

1.1959 

1.000 

1.855 


* Reproduced from Table XII in J. 
pretation of Certain Test Criteria for 
20A (1928), pp. 175-240, 235. 


Neyman and E. S. Pearson, ” On the Use and Inter- 
Purposes of Statistical Inference,” Biometrika, Vol. 


The Probability Distribution of X. To make use of X contours 
for regions of rejection, the probability distribution of X must be 
determined. This has been done by Neyman and Pearson 
and the results published in a table of probabilities. These 
are condensed in Tables 41 and 42. For N = 3 to N = 11 and 
foi various values of k, the probability P\ of getting as great or 
greater values of X by chance is given in Table 41. These have 
been taken from a more elaborate table worked out by Neyman 
and Pearson. For larger values of N, Table 42 is found useful. 
This gives, for various values of k, the ratio P x /X, affording a set 
of relationships that is practically independent of N. 1 

1 It will be noted that means the probability of as great or greater value 
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Testing a Hypothesis with a X Contour. To make use of these 
fundamental relationships in an actual problem proceed as 

follows: Calculate M = -—-—and S = -> and 

0 o 


k = .4343 (M 2 + s 2 ) - log 10 £ 2 . 


If N is equal to or less than.10, use Table 41 to note whether the 


probability of as great or greater X 


given by log X = ^-(.4343 — k) 


is less than or equal to .05 (or .01 if this is taken as the coefficient 
of risk). If it is less than or equal to the coefficient of risk, then 
the given sample must fall in the region of rejection marked off by 
the .05 (or .01) X contour. If N is greater than 10, find from 
Table 42 the value of P x /X corresponding to the value of k com¬ 
puted for the sample. Then compute log X = ^ (.4343 — k), 

and multiply P x /X by this value of X. The result is P x , the 
probability of as great or greater value of X. If this is less than, 
the adopted coefficient of risk, then the sample falls in the region 
of rejection marked off by the X contour. 

For illustration this procedure may be applied to the problem 
stated above on page 344. In the sample, N = 11, X = 95, and 
(j = 13. It is desired to test the hypothesis that the mean of the 
population from which this sample was drawn is 100 and its 
i standard deviation 10. The investigator is indifferent as to 
whether the actual values of the population mean and population 
i standard deviation are greater or less than 100 and 10, respec¬ 
tively, and he decides upon a coefficient of risk of .05. 

! For these data, M ■= - -jg-= —.5, and S — — 1.3. 

' Hence, M 2 = .25, ,S 2 = 1.69, and 

, k = (.4343) (.25 + 1.69) - .2279 = .6146. 

i Looking up this value of k in Table 41 it is found that, for N — 11 
i and k = .65, P x — .0822. Hence, for k == .6146, P x must be 
; greater than .08 and certainly greater than .05. Accordingly, 


of X. If it had been written P (X) it would have meant simply the probability 
of X. Thus P\ is the probability of a range of values beginning at X and 
running to infinity. 
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the sample in question does not fall in the region of rejection, and 
the hypothesis is not rejected. 

Since N — 11, Table 42 could have been used instead of 
Table 41. Interpolation in Table 42 shows that, for k = .6145, 
i\/A = 1.1969. For this given value of k, the value of log A is 
(■¥■)(.4343 - .6146) = -.9922, or 9.0078 - 10. This gives 
A = .1018. Multiplying the value of P x /A by this value of A 
gives P\ — (1.1969) (.1018) = .1218, which corroborates the 
previous conclusion that P x w r as greater than .08. 

The calculations in Table 40 have illustrated the construction 
of a A-contour region of rejection for the testing of a hypothesis 
as to the-mean and standard deviation of the population. The 
resulting A-contour region of rejection (region IV) is graphically 
depicted in Fig. 106. It will be noted that the A contour is a 
symmetrical ellipse with the intersection of 6 and X as its center. 1 
Table 40 and Fig. 106 are presented as an aid in visualizing the 
principle involved in these problems; such a table and figure 
need not, of course, be constructed for an actual problem. The 
problem illustrated above was solved without the use of such 
a figure. 

Use of a Corner Region Illustrated. The foregoing has been 
based on the assumption that the investigator is indifferent as to 
whether the true values of the mean and standard deviation are 
greater or less than the hypothetical values being tested. If an 
investigator is more concerned with the possibility that the 
true mean is less, say, than the hypothetical mean being tested 
and that the true standard deviation is greater than the hypo¬ 
thetical standard deviation, he will have the least probability 
of accepting the hypothesis when the true values actually bear 
this relationship to the hypothetical values if he locates his 
region of rejection entirely in the upper left-hand quadrant of 
the distribution of samples. Logically, it would seem appro¬ 
priate to find the A contour that marked off a .05 region 
in this upper quadrant. Unfortunately, this presents such 
great mathematical difficulties that the procedure is imprac¬ 
ticable. The following discussion is therefore offered as a crude 
substitute: 

1 The ellipse obtained in this way is different from that of Fig. Ill, The 
latter gives joint values for the population mean and standard deviation 
for given values of the sample mean and sample standard deviation (see 
p. 369). 
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The determination of a corner region of rejection may be 
explained with reference to the following problem: Suppose it is 
claimed that a new method of cultivating potatoes in a given 
locality will average a yield of 100 bushels per acre and a standard 
deviation of 10 bushels. Suppose that this offers a gain over 
the old method of cultivation in that the mean is greater and the 
standard deviation is less. To test this claim a farmer uses the 
new method of cultivation on 10 plots of ground of the same 
size and finds that the mean yield is 97 bushels and the standard 
deviation is 13.5 bushels. Can he reasonably reject the hypothe¬ 
sis that the average yield will in the long run be 100 bushels and 
the standard deviation 10 bushels? 

If the true mean of the population is 100 bushels and the true 
standard deviation is 10 bushels, 50 per cent of samples of 10 
will have an average jdeld equal to or less than 100. Likewise, if 
the true standard deviation is 10 bushels, 50 per cent of samples 
of 10 will have standard deviations equal to or greater than 9.15.* 
Accordingly, the probability of a sample of 10 having a mean 
less than 100 and a standard deviation greater than 9.15 is .25. 

If two other values for the mean and standard deviation can 
be found such that 20 per cent of the samples have means tying 
between 100 and this second mean value and standard deviations 
tying between 9.15 and this second standard deviation value, 
these second values for the mean and standard deviation will 
mark off a .05 region in the upper quadrant that would appear 
to be a good region of rejection for the present problem. The 
square root of .20 is .4472. Therefore, if a second mean value 
can be found such that .4472 of the samples have means lying 
between this value of 100 and a second standard deviation value 
can be found such that .4472 of the samples have standard 
deviations tying between this second standard deviation value 
and 9.15, the two values for the mean and standard deviation so 
determined will mark off the desired region of rejection. 

A table of normal probabilities shows that .4472 of the samples 
would fall between the mean and the mean less 1.618cr. Since 
the standard deviation of the mean is d /\/~ N t it would have 
the value 10/\/l0 = 3.16. Hence, .4772 of the samples would 
fall between 100 and 100 — 1.618(3.16) = 94.9. 

* For n = 9, probability is 50 per cent that a y 2 will exceed 8.343; there¬ 
fore, probability is 50 per cent that iV<r 2 /d 2 = 10<r 2 /l0 2 will exceed 8.343 or 
that cr will exceed \/83.43 = 9.15. 
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Similarly, a table of the % 2 distribution 1 shows that, for 
n = 9, .4772 of the cases lie between the median value 8.343 
and the value % 2 = 16.774. Since N<r 2 /d 2 has a sampling dis¬ 
tribution of the form of the % 2 distribution, this means that 
.4472 of the samples will have a a lying between 9.15 and 12.95; 
for lOo-VlOO = 16.774 gives <r = 12.95. 

Accordingly, the desired .05 region of rejection will be such 
as the shaded area of Fig. 110. It will include all samples whose 



94.5 X 

Fig. 110.—An example of a corner region of rejection. 

means are less than 94.9 and whose standard deviations are 
greater than 9.15 and also all samples whose means lie between 
94.9 and 100 and whose standard deviations are greater than 
12.95. Inasmuch as the given sample has a mean of 97 and a 
standard deviation of 13.5, it falls in this region of rejection and 
the hypothesis must be rejected. 

Determination of Joint Confidence Zone for the Mean and 
Standard Deviation. In the estimation of a single population 
parameter it was found possible to mark off a range of values in 
the neighborhood of the sample result that could be said to 
include the population value with a given degree of probability. 
Such a range of values was called a “confidence interval” for the 
given population parameter, and the probability of its including 

1 Pearson's Tables for Biometricians and Statisticians was used for this 
purpose since it permits a finer interpolation than the table included in the 
Appendix of this book. 
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the true value of this parameter was called the “ confidence 
coefficient.” 

In a similar fashion, when two parameters are being estimated 
jointly, it is possible to lay off joint ranges of values that may be 
said to include the true population values of these parameters 
with a given degree of probability. This joint range of values is 
called a “confidence zone,” and the probability associated with 
it is its confidence coefficient. 

It will be recalled that the upper limit of the confidence 
interval for a single parameter was taken as the value of the 
parameter that would make the probability of the given sample 
result or a lower value just equal to a predetermined figure, say 
.025. Likewise, the lower confidence limit was the value of the 
parameter that would make the probability of the sample result 
or a higher value just equal to another predetermined quantity, 
again say .025. The sum of these two probabilities is equal to 
the selected coefficient of risk, and the complement of the sum 
is the confidence coefficient, say .95. 

Another way of looking at it is that the upper confidence 
limit is so chosen that the sample would fall exactly on the upper 
boundary of the lower region of rejection, and the lower com 
fidence limit is so chosen that the sample would fall exactly on the 
lower boundary of the upper region of rejection. 

In order to determine a joint confidence zone for two param¬ 
eters a similar procedure is possible. The limits of the zone are 
determined such that if the population parameters have any 
of these limiting values then the sample will fall on the boundary 
of the region of rejection. In the case of the mean and standard 
deviation of the population a confidence zone with a confidence 
coefficient of .95 would be determined by finding the values of 
these parameters that would make the given sample fall on the 
X contour marking the .05 region of rejection. The equation 
for the X contours, it will be recalled, 1 is as follows: 

logic x = f [logio -S 2 - (M 2 + S 2 - 1)](.4343) (3) 

or 

k = .4343 (Af 2 + S 2 ) - logio S 2 (5) 

in which k = .4343 — % logic X, M = ——and S = -• 
iV oo 

1 See p. 353. 
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If hypothetical values are assigned to X and <5 and if X (or k) 
is given the value that makes the probability of as great or 
greater X just equal to .05, the equation above will represent 
the boundary line of the .05 region of rejection to be used in 
testing the given hypothesis. If, however, a particular sample 
value is substituted for X and a and if X (and hence k) is given its 
.05 value, then this equation will show what hypothetical values 
of X and d would cause the given sample to fall exactly on the 
boundary line of the .05 region of rejection. When so used the 
equation becomes the formula for the boundary of the joint 
confidence zone for the mean and standard deviation. 

Suppose, for example, that a random sample of 11 cases 
is found to have a mean of 95 and a standard deviation of 13. 
What is the joint confidence zone that may be said to cover 
the true values of the population mean and standard deviation 
with a probability of .95? For samples of 11, the value of k 
for which the probability of as great or greater X is just .05 is 
found from interpolation in Table 41 to be approximately .695. 
On substituting this value of k and the given sample value of X 
and a in Eq. (5), the boundary of the joint confidence zone for the 
mean and standard deviation of the population is given by the 
following, 


.695 = .4343 I 




which, on solving for X gives 

X = 95 ± \/-169 + 2.3026d 2 (2.9229 - logio <S 2 ) 


By substituting various values for d and obtaining the cor¬ 
responding values of X a graph may be drawn of the boundary 
line of the desired confidence zone. This has been done in 
Table 43 and depicted in a graph in Fig. 111. 

The result so obtained may be interpreted as follows: The set 
of X, d values included within the oval-shaped curve of Fig. Ill 
constitutes the .95 zone of confidence for the mean and standard 
deviation of the population. In other words, there is a chance 
of .95 that one of the pairs of values included within the curve 
is the pair whose values are those of the actual mean and standard 
deviation of the population. Accordingly, any pair of values 
included within this curve is a reasonable hypothesis as to the 
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true values of the mean and standard deviation of the population 
—it would be a hypothesis that would not cause the given sample 
to fall in the .05 region of rejection of Fig. 106. 

Any pair of values outside the oval-shaped curve of Fig. Ill 
would be an unreasonable hypothesis as to the true value of the 
mean and standard deviation of the population. The curve 
shows that the maximum tenable value for the mean of the 
population is approximately 107 and the minimum tenable 
value is approximately 83. It also shows that the maximum 
tenable value for the standard deviation of the population is 



Fig. 111.—The .95 joint confidence zone for X and <J, given X - 95, a = 13, 

and N = 13. 

approximately 25.3 and the minimum tenable value is approxi¬ 
mately 8.8. 

It is to be noted, however, that these extreme values for 
the mean and standard deviation of the population are not 
jointly tenable. If the mean should be assumed to have the 
value 107, the only tenable value for the standard deviation 
of the population would be 17.5. If the standard deviation 
should be assumed to have the value 25.3, the only tenable value 
for the mean of the population would be 95. Joint tenability, 
of course, is the very essence of the diagram. It shows what 
range of values for one parameter is jointly tenable with a 
given value of the other variable. 

If the mean is assumed to be 85, for example, then the values 
of the standard deviation that are jointly tenable with this value 
of the mean are the values from 13 to 22, Likewise, if the 
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standard deviation is assumed to have the value 20, say, the 
values of the mean that are then jointly tenable are the values 
from 84 to 106. 


Table 43.—Illustrating the Calculations Necessary for the 
Graphing of a Joint Confidence Zone 
For the mean and standard deviation of the 'population 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

d 

d 2 

log d 2 

2.9229 - 
log d 2 

2.3026d 2 

(5) • (4) 

(6) - 169 

V(7) 

95 ± (8) 

9 

81 

1.9085 

1.0144 

186.5106 

189.20 

20.20 

± 4.5 

99.5 

90.5 

10 

100 

2.0000 

.9229 

230.2600 

212.51 

43.51 

± 6.6 

101.6 

88.4 

11 

121 

2.0828 

.8401 

278.6146 

234.06 

65.06 

± 8.1 

103.1 

86.9 

12 

144 

2.1584 

.7645 

331.5744 

253.49 

84.49 

± 9.2 

104.2 

85.8 

13 

169 

2.2279 

.6950 

389.1394 

270.45 

101.45 

±10.1 

105.1 

84.9 

14 

196 

2.2923 

.6306 

451.3096 

284.60 

115.60 

±10.8 

105.8 

84.2 

15 

225 

2.3522 

.5707 

518.0850 

295.67 

126.67 

±11.3 

106.3 

83.7 

16 

256 

2.4082 

.5147 

589.4656 

303.40 

134.40 

±11.6 

106.6 

83.4 

17 

289 

2.4609 

.4620 

665.4514 

307.44 

138.44 

±11.8 

106.8 

83.2 

18 

324 

2.5105 

.4124 

746.0424 

307.67 

138.67 

±11.8 

106.8 

83.2 

19 

361 

2.5575 

.3654 

813.2386 

303.73 

134.73 

±11.6 

106.6 

83.4 

20 

400 

2.6021 

.3208 

921.0400 

295.47 

126.47 

±11.2 

106.2 

83.8 

21 

441 

2.6444 

.2785 

1015.4466 

282.80 

113.80 

±10.7 

105.7 

84.3 

22 

484 

2.6848 

.2381 

1114.4584 

265.35 

96.35 

± 9.8 

104.8 

85.2 

23 

529 

2.7235 

.1994 

1218.0754 

242.88 

73.88 

± 8.6 

103.6 

86.4 

24 

576 

2.7604 

.1625 

1326.2976 

215.52 

46.52 

± 6.8 

101.8 

88.2 

25 

625 

2.7959 

.1270 

1439.1250 

182.77 

13.77 

± 3.7 

98.7 

91.3 


The equation to which the calculations above give solutions 
is as follows: 


.695 = .4343 


(95 - X) 2 169 

d 2 + d 2 


— log 169 + log o- 2 


or 

X = 95 + V-169 + 2.3026d 2 (2.9229 - log d 2 ) 

This equation is Eq. (5) with k = .695, N = 11, X — 95, and 
a = 13. 

Joint Maximum-likelihood Estimates of the Mean and Stand¬ 
ard Deviation. A final use of the joint distribution of the mean 
and standard deviation is to make joint maximum-likelihood 
estimates of the mean and standard deviation of the population. 
The equation for this joint distribution is Eq. (1). This equa- 
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tion may be viewed from two aspects. For a given value for 
each of the population parameters X and d, it may be viewed 
as giving the probability of getting various sample values for 
statistics X and <r; for given sample values of X and <r, it may be 
viewed as giving the probability of getting such a sample result 
for various hypothetical values of the parameters X and d. 

In testing hypotheses and determining confidence limits the 
first view of the equation has been adopted and illustrated; 
in determining maximum-likelihood estimates it is the second 
aspect of the equation that is significant. For, by definition, 
the maximum-likelihood estimates of the population mean and 
standard deviation are the values of these parameters that will 
make the probability of the given sample a maximum. Thus a 
study is made of how the probability of the sample varies with 
different hypothetical values for X and d, and the values that 
1 make this probability the greatest are the maximum-likelihood 
1 estimates. Algebraically the procedure is as follows: 

Since the probability of the given sample will be a maximum 
when its logarithm is a maximum, the first step is to simplify 
Eq. (1) by taking logarithms. This gives the following result 

i.* pc*,) - -log- s -- >> >■*« 

+ other terms not involving X or d (6) 

or, since d| = d 2 /iV, 


log P(X,a] 


N(X 


Mi-nnog. 

4- Wms Tint ITlVolv 


Y nr 


The values of X and d that make this a maximum will be those 
for which the partial derivatives with respect to these param¬ 
eters are equal to zero. Taking these derivatives and setting 
them equal to zero give the following results: 

2N{X - X) = 

2d 2 

N , N(X - t) + Nv* _ n 

7 > “ ^2 u 



The common solutions of these two equations are X = X and 
d = or. These, then, are the joint maximum-likelihood estimates 
of the mean and standard deviation of the population. 






CHAPTER XV 

SAMPLING FLUCTUATIONS IN REGRESSION STATISTICS 


Lines and planes of regression are commonly used devices to 
estimate one variable from one or more other variables. It is 
therefore desirable to study sampling fluctuations in regression 
statistics, higher order variances, and lines and planes of regres¬ 
sion in order to determine the accuracy of the estimates made 
from them. Such a study is the purpose of this chapter. 

MAXIMUM-LIKELIHOOD ESTIMATES 
OF REGRESSION STATISTICS AND HIGHER-ORDER VARIANCES 

Two Variables. The problem to be considered here is this: 
If a random sample has been taken from a given bivariate popula- 
ti°n, what are the best estimates that may be made of the 
regression parameters of the population and how will these 
estimates fluctuate from sample to sample? In what follows 
it will be assumed that the population is a normal population 
in which the lines of regression are the loci of mean values and the 
distributions of cases around these lines are normal distributions. 

Consider first the line of regression of Xi on X 2 , and let the 
equation of this line in the population be represented by the 
equation 

-V = A 1.2 A - b a2 X 2 

The problem is to estimate ai, 2 and b 12 and to determine the 
sampling fluctuations of these estimates. This section will be 
devoted to the first of these problems. Although the analysis 
relates to only one of the lines of regression, it is equally applicable 
to the other. 

As in other cases the method of solving this problem will be 
the method of maximum likelihood. In other words, the 
estimates of ai. 2 and b i2 will be those that will make the logarithm 
of the probability of the given sample a maximum. The pro¬ 
cedure is as follows: First note that any pair of X 1? X 2 values 
may be represented by a point in a plane such as point P in 
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Fig 112 If values are assigned to a L2 and bi 2 , the hue of 
regression represented by Eq. ( 1 ) will be a straight line m this 
plane. The vertical deviation of the point P from the 
regression will be equal to 

v = Xi - Xi = Xi - a x .s - b i2 X 2 

For each pair of X x ,X 2 values a vertical deviation from the line 
of regression can thus be calculated A sample of V. 
pairs of values can thus be translated into a sample set of devia¬ 
tions from the line of regression. If the values assigned to 
ai , and b 12 are such as to make the logarithm of the probability 
of' the corresponding set of deviations a maximum, they will 
also be such as to make the logarithm of the probability of the set 
of sample Xi,X 2 pairs of values a maximum, and vice versa. 


P 

-rr 


xiWi 


Fig. 112.- 


-Graph of a vertical deviation of a point from the lme of regression 
of Xi on X<i. 


Since the population is assumed to be a normal bivariate 
population, it follows that the deviations from the line of iegies- 
sion are normally distributed with a mean of zero and a standard 
deviation equal to the population first-order standard deviation 
dx.2. The probability of each deviation will thus be of the fomy 

<2) 

Since the sample is assumed to be a random one, the various 
deviations will be independent of each other and the probabi l y 
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o the whok sample of deviations will be the product of their 
individual probabdities. Inasmuch as exponents are added in 
mltiplic a tion the probability of the sample set of deviations 
will be given by the equation 


dP(v h v 2 ,v 3 , ■ 
* 

exp 


> Vn) 


(<* 1.2 


vt + vi + + 


+ t;‘ 


26 \. 


dv i dv 2 dv- 


which may be written 
dP(v h v 2 ,v 3 , • • • , v N ) 

1 


(<*1.2 \/2tt) N 


exp 


2^2 

24 ; 


dv 1 dv‘ 


dv?)? 


dv N (3) 


But each v is of the form X x - ai . 2 - b 12 X 2 , so that the exponent 
of e can be written 2(X x - a L1 - b I2 X 2 ) 2 . According to the 
method of maximum likelihood, the values of a, 2 and b 12 are 
to be so chosen that the logarithm of Eq. (3) is a maximum. 

Since the logarithm of is independent of a, 2 and 

estimates'^ their m ^imum-Iikelihood 

estimates. The logarithm of the second part of Eq. ( 3 ) i s 

merely the exponent of e, since log. e* is a; by definition. Hence 

the logarithm of Eq. (3) will be a maximum when 

— 2(Xi - a,. a - b 12 X 12 ) 2 

is a maximum, or 2(X x — ai, 2 — b 12 X 2 ) 2 is a minimum. 

Accordingly the mathematical problem is so to choose a,. 2 
an i 2 at 2(X x . a ]-2 — b 12 X 2 ) 2 is a minimum. It will be 
recognized that this is the criterion of least squares. That is if 
a line of regression is fitted to a sample set of data so that the sum 
o the squares of the deviations from the line is a minimum, then 
the values of the regression statistics so obtained are the maxi¬ 
mum-likelihood estimates of the population regression param- 

nSa, ? • r S(Xl 7*“ “ blA)2 iS t0 be a ~um, its 
partial derivatives with respect to a x . 2 and b 12 must be zero. 

lnese two conditions give the equations: 


^(^1 8-1.2 bl 2- Z 2 ) = 0 

- a li2 - b 12 X 2 )X 2 = 0 


J (4) 
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If di.2 and. h 12 represent the maximum-likelihood estimates of ai. 2 
and bi2, respectively, as given by Eq. (4), then 

di.2 — Xi — 012X2 I 

y _ SX iX 2 - XXA } (5) 

12 2X1 -NXi ) 

The sample least-squares values of a 1.2 and b 12 are thus the 
maximum-likelihood estimates of the population parameters. 
Similarly, the sample least-squares values of a 2 .i and 621 are the 
maximum-likelihood estimates of a 2 .i and b 2 i. 

More Than Two Variables. In like manner, it can be shown 
that the sample least-squares values of the regression coefficients 
of planes and hyperplanes of regression are the maximum-likeli¬ 
hood estimates of the corresponding population regression 
parameters. Thus the maximum-likelihood estimates of ai.23..,., 
bi2. 3 ..., bis.2... are given by the solutions of the least-squares 
equations. 

2(Xi — ai.23... — b 12.3... X 2 — bi 3 .2... X 3 — v * ‘) ” 0 \ 

2(Xi — at.23... bi2.3... X 2 b 13.2 . . . X3 )X 2 — b ( 

2(Xi — ai.23... — b 12.3... X 2 — bia.2... X 3 — • ’ 0^3 = 0 v 

Similar equations hold for maximum-likelihood estimates of 
a 2 .i3..., b2i.3..., b31.2..., etc. 

SAMPLING DISTRIBUTIONS OF REGRESSION STATISTICS 

Two Variables. The Sampling Distributions of di.2, 512, d 2 .1, and 
6 21 . The sampling distributions of regression statistics calculated 
by the method of least squares (i.e., the sampling distributions 
of the maximum-likelihood estimates of the regression param¬ 
eters) are similar to the sampling distributions of the mean. 
If the population is normal, these distributions are normal. 
The means of the sampling distributions are the population 
values and the standard deviations are the population higher- 
order standard deviations divided by y/N times some 

function of the independent variable. These general formulas 
will first be explained and illustrated with reference to two 
variables. 

For a line of regression the maximum-likelihood estimates of 
a 1,2 and bi 2 are given by Eq. (5) t The sampling distributions 
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of these estimates ($ 1.2 and b i2 ) are normal with means equal 
to the population values of a U2 and b i2 and standard deviations 


equal to and — 

V N °"2 \/N 
distributions of a 2 .i and 


respectively. Similarly, the sampling 
5 2 i are normal with means equal to 


a 2 .i and b 2i and standard deviations equal to —and —EL_., 

\/N (Ti ViV 

respectively. 

If the first-order standard deviation of the population is not 
known, it must be estimated from the sample and the i distribu¬ 
tion must be used in place of the normal curve, the appropriate 
value of n being N — 2. If the sample is large, the t distribution 
is so close to the normal distribution that the latter may be 
used instead. Hence, it makes little difference in the case of 
large samples whether or not the population first higher-order 
variance is known. In general, all the discussion of Chap. XI 
concerning the sampling fluctuations in the mean are applicable 
to the sampling fluctuations in a L2 , b l2 , a 2A , and b 21 . 

Use of Distributions in Testing Hypotheses . To illustrate the 
testing of a hypothesis regarding a regression coefficient of a 
normal bivariate population, consider again the data on grades 
of Mount Holyoke students in first-semester English, X 2 , and 
second-semester English, Xi. It was seen in Chap. XIY of 
Smith and Duncan's Elementary Statistics and Applications 
that the value of b 12 is .8322. Since grading in the two courses 
was apparently on a similar basis, it might be expected that a 
student whose grade exceeded the mean grade in first-semester 
English by 20 points, say, would tend to exceed the mean grade 
in second-semester English by an equal amount, so that the 
value of bi 2 might be expected to be 1.00. Let the hypothesis, 
therefore, that b i2 = 1 be tested in the light of the sample 
result. 


Since the population first-order standard deviation is not 
known, it is necessary to estimate it from the sample. As 
indicated below, 1 the maximum-likelihood estimate of the 
population first-order standard deviation is derived from the 
sample first-order standard deviation by multiplying it by 
s/N 

Since the sample first-order standard deviation is 


1 See p. 383 
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19.53,* the maximum-likelihood estimate of the first-order 
standard deviation of the population is thus 

oh .2 = 19.53 v^fiy = 19./8. 

The estimate of the standard error of 612 is consequently 


0"&i2 


<7 1.2 

cr 2 y/N 


19.78 
(47.29) (9) 


.04647, 


since <r 2 = 47.29 and N = 81. 

The difference between the sample value, . 8322 , and the 
hypothetical value, 1 . 00 , is 0.168, which is more than three 
times the estimated standard error. Since the sample is large, 
the normal curve may be used to test the hypothesis. The 
lower .05 point on a normal curve comes at 1.645(7. Hence the 
sample value obviously falls far below the .05 point, and there¬ 
fore the hypothesis must be rejected. The value of bi 2 is almost 
certainly not 1 . 00 . 

Use of Sampling Distribution in Determining Confidence Limits. 
Confidence limits can be obtained for b i2 as follows: Suppose 
first that the confidence coefficient is set at .95, in other words, 
that the confidence limits are to be so chosen that the chances 
of their including the population value are 95 out of 100. Fur¬ 
ther, let the confidence interval be such that the chance of failing 
to include the population value because the interval is set too 
high is just equal to the chance of failing to include the population 
value because the interval is set too low. In short, let the desired 
confidence interval be an unbiased interval with a confidence 


coefficient of .95. 

For the Mount Holyoke data the value of 612 was .8322, and 
the maximum-likelihood estimate of the standard error of b 12 
has just been seen to equal .04647. Since the sample was large 
(JV = 81) and the sampling distribution can be taken as normal, 
an unbiased confidence interval lor bi 2 with a confidence coeffi¬ 
cient of .95 will be given by 612 ± 1.96cfg 12 , which for the given 
data yields .8322 ± 1.96(.04647) = 0.9233 and 0.7411. The 
desired confidence interval is thus 0.7411-0.923,3. If the sample 
had been small, the t distribution would perforce have been 
used. For example, if N = 20, then the desired confidence 
limits would have been 0.8322 + 2.10(.04647), where 2.10 is the 


* See Smith, J. G., and A. S. Duncan, Elementary Statistics and A -pplica- 
tions, p. 363. 
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.05 value for i with n = 20 - 2 = 18. Had there been but 
20 cases in the sample, therefore, the confidence limits would 
have been 0.7326 and 0.9318. 

More than Two Variables. The argument for two variables 
can be extended without difficulty to samples involving more than 
two variables. Thus, as already indicated, the sampling distri¬ 
bution of any maximum-likelihood estimate of a regression 
parameter of a normal population is normally distributed with a 
mean equal to the population b i2 and a standard deviation equal 
to 6i.j k ...n divided by \/N times some function of the independent 
variables. In the case of two variables the function of the 
independent variable that was used was its standard deviation. 

Thus dg 12 equaled —For more than two variables the 

&2 s/N 

standard-error formulas are equally simple, the form of their 
equations being symmetrical, as follows: 

> di.23 

d<s - - vf 

„ <*1.23 

°b 12 3 /-=-=. 

02.3 ViV 


02.34 V- 

dl.234 


and, in general 


04.23 VN 


The standard deviations 07 . &... n can b€ 
°hkt ... = 0-?(l - r%){\ - r%j) 

1 Cf. Smith and Duncan, op. cit., p. 436. 


. n can be calculated by the equation 1 


(1 -r; 
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, 1 f A i« not known, it must be estimated 

If the value of a,,,...* » not kuow^ ^ ^ ^ ^ ^ 

E v'E rr 

Etisties in the regression equation. If the sample is relatively 
statistics in me & . o 0 t h e normal curve can be used 

large, however, say IV - m > 30 the nor from ^ 

with sufficient accuracy, even if 

sam P le value - T rhan X Yii of Elementary Statistics 

^7211 where Xi represents a student’s grade m second- 
semester English, X 2 her grade in first-semjer ^ * n ‘ Z 

verbal scholastic-aptitude test score, and X he g 

College Board Entrance Examination m English. 

was 18.63, and the maximum-likelihood estimate of *.,« 
is 1 found by the equation 


<71.234 — 18.63 


81 


= 19.55. 


The maximum-likelihood estimate of the standard error of 


b 12.34 IS <75 12.34 


<71.234 


<72.34 V N 

from the equation 2 

(Tj.kl = <7j V 1 ~~ 

In the present instance this gives 


The value of < 72.34 may be obtained 

X 8 ) 


(72 34 = Vl ■" r 23 V^ 

= 47.29(.8046) (.9192) 


r 24.3 


34.98 


The values of 1 - r*. and 1 - rl 4 . 3 are derived from the data 
for r 23 and r 24 3 given in Elementary Statistics and Applications. 

h”“ to malfrmm-likelih(>od d 0. MM »” 

of 6l2. 3 4 is 

= = .0621 

<75f2.3 1 


(34.98) a/81 


1 Cf. p. 376. 

2 c f- P* 378 - ^ 445 456-458. To obtain values of 

“ telZZ’oZ, the Ze and cosine tab.es may be used; for 

. _ -» /'l — T n 
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hvn 0 +i, mUCh - f ° r the preIiminar y calculations. Suppose the 

tendVshof , S6t Up . that grades in second-semester English 

tend to show less variation than those in first-semester Fnflkh 

r '" 1 f 1 e "«"“ 

made 1 LA L . Commonl y believed that after allowance is 
^chool. trliningTr”^^ 

EngW. Vi '“"‘T °' “ ,y 71 P0 “ t= “ 

Thf r’l M a hypothesis that ^e value of b 12 34 = 7500 

The value of & 12 . 34 calculated from the sample is 7211 How 

.he s v i 3SSV 1 ” rr r n,r ,o “ mp “ 

for b 12 , 4 with the standard dev^tio^f ^ ^^ftf Value 
data this yields the following result th<? glV6n 

•7211 - .7500 _ -.0289 

-0621 -06VT = — °-464 

Thp egl °? of rejection be taken as values of x/6 < -1645 
The sample value of x/6 = -.464 obviously does notlall in the 
region of rejection, and the hypothesis of b 12 34 = 7500 i s 
therefore not rejected. If the sample had been small then 
the distnbutmn, with « = N - 4, would have been used 

nfidence Limits. If the confidence coefficient is set at 95 

dataware Sn' by ^+* bl2 ' 34 of the Mo ™t Holyoke 

>«* .72,! r,.M( oS) ± , 1 S'“j'- h,Ct «“ *- -«* 




Hence the chances are .95 that the range from 599 to suq 

-‘r 1 i, ”; J h F »; 

ol 1.96 " ~ W “ ld h ‘ m to be used in 



SAMPLING FLUCTUATIONS IN REGRESSION STATISTICS 381 


SAMPLING FLUCTUATIONS IN LINE OR PLANE OF REGRESSION 
Lines of Regression. 1 For any given value of X 2 the sample 
regression value of X[ is a function of <b . 2 and bi 2 given by the 
sample regression equation XI = <*1.2 + Henoe sam P 

fluctuations 2 in di . 2 and bn will cause sampling fluctuations m ^ v 
Let X 2 be measured from its mean value so that the regression 
equation can be written XI = cfi . 2 _+ bnx*. In this case, 

2 = Xi, but to avoid a shift in notation it will continue to pe 
called <*1.2. Under these conditions, sampling fluctuations m 
XI for an arbitrarily selected value 3 of * 2 will be normally dis¬ 
tributed with a mean equal to the population value of X x for the 
selected value of x 2 , that is, XI = ai . 2 + b 12 a; 2 , and a variance 
equal to the variance of ai.2 plus the variance of 12 ip ie 4 
by x\. These relationships may be written succinctly as follows. 

| X[ = ftl .2 + bl 2^2 | ( 9 ) 

1 dxi' = y/ T* $bn x l f 

where XI refers to the mean of sample values of XI for the 
selected z 2 and 6 Xl ' refers to the standard error. 

With the help of these formulas the sampling fluctuations m 
XI can be analyzed in the same manner as the sampling fluctua¬ 
tions in a mean or a regression coefficient were analyzed. For 
example, a particular hypothesis regarding a certain value of , x 
can be tested by taking the ratio of the difference between the 
hypothetical value and the sample value to the estimated stand- 

1 In this section and the rest of this chapter, X l is taken as the dependent 
variable The argument is valid for any dependent variable, however, and 
formulas in which X 2 or X 3 are taken as the dependent variable may be 
derived by interchanging the subscript 2 and 1 or 3 and 1. . 

* The assumption underlying the argument of this section 18 ^>1 
values of X 2 are the same from sample to sample but the values of X, . y 
at random. The sampling is therefore of those values of Xx associated with 

given values of X%. 

» See footnote 2, page 380. „ , y A 

‘ It will be noted that the “variables” in this ease are - and 6 122 i 2 (the 
selected value of x% being a constant, a sort of coefScient for in). Equa¬ 
tions (9) are thus merely applications of the theorem that the mean of the 
sum of two normally distributed variables is the sum of their means and the 
variance of their sum is the sum of their variances il the variables are inde¬ 
pendent (c/. pp. 419-421). It may be shown that «,. 2 and i I2 z 2 are mde- 

nendent in their sampling fluctuations*. 
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ard error of this difference. If the sample is small and the 
standard error has to be estimated from the data, this ratio is 

Inm UP f A 6 Wi f n==N ~ 2 - K the sample is large 
r if the standard error is based upon population values, then the 
normal curve may be used. 

hJu,’ agaln ’ u ^ iased confidence limits for X[ can be determined 
' I'i'qf ° T 'V' 1 "' fr0n ! the sam P le if the sample is small, 
” ^ . T '« , lf the sam P le 18 lar ge. The symbol £. 05 refers to 

the .05 point in a t table for the given value of n. The analysis 
is essentially the same as before and will not be repeated here 
One aspect of the sampling fluctuations in XJ, however 
deserves further consideration. Since the standard error of 
X is a unction of the values of the independent variables, con¬ 
fidence limits for Xj will vary with its location on the line or 
plane of recession. Furthermore, it is to be noted from Eos. (9) 
that, the closer z 2 is to its mean, the closer the confidence inter- 
. y t0 Sample values of V- The loci of the confidence 

hW f° r +*? r° mtS 0 ,r the line 0f re ® ression yield confidence 
limits for the line itself. If the confidence coefficient is .95 it 

may be said that the chances are .95 that these limiting loci 
include the population line of regression. 

« the sample is small and the standard errors have to be 
estimated from the data, the equations of the limiting loci with 
a .95 confidence coefficient are 

V = <*1.8 + + £. 05 y<7V, + (10) 

where is the .05 point of a t table for n = N - 2 If the 
sample is large, 1.96 can be used in place of A graph of the 
limiting loci for the line of regression of X x on X 2 for the Mount 
ilolyoke data is shown in Fig. 113, and the numerical values for 

the graph are given in Table 44. These were computed from the 
equations 


X x - 217.3 + 0.8322^2 ± 1.96 \/4~83 + .002159^1 

which are the equations for the limiting loci. Here 4.83 = <r‘- d 
and is calculated from the equation 


v & 


2 _ 


u 1.2 
N’ 


j Thl ^ 18 the P°'nt on each tail that gives an individual tail area of 025 
and a combined tail area of .05 (see Table VII in the Appendix). 
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where a\ 2 is the maximum-likelihood estimate of the population 
first-order variance and is equal to the sample first-order variance 

multiplied by -—-g For the given data, 

„ _ (81X1053)! = 391.07 

ffi.2 - 79 


and therefore = 391.07/81 = 4.83. The quantity 0.002159 
is and is emial 1 to the square of .04647, which was found 



us—Maximum likelihood estimate and confidence limits for line of 
regression The Xnees are .95 that these limiting loci include the populate 

r\-f vckfrr/^sainTl. 


above to be the standard error of U The numerical values 
from which Fig. 113 was constructed are given m Table 44. 
It may be noted again that the limiting loci are closest to the 
sample regression line at the mean of X 2 (t.e., where x 2 0) 

1 Actually, the value 0.002159 was calculated from the squares of the 
components and differs slightly from the square of 0.04647 itself, 
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and tend to swing away from this line as X 2 departs further and 
further from its mean value. 

Table 44.— Solutions for the Equation Showing Confidence Limits 
F0R a Line of Regression 
Equation of confidence limits: 

= 217.4 + 0.8322&2 + 1.96 y / 4.83 + R002159;c^ 


( 1 ) ( 2 ) 


-140 19,600 
-12014,400 
-100 10,000 
-80 6,400 
-60 3,600 
-40 1,600 
-20 400 

0 0 
20 400 

40 1,600 
60 3,600 
80 6,400 
100 10,000 
120 14,400 
140 19,60o[ 


(3) 

(4) 

(5) 

(6) 

(7) 

(8) 

(9) 

0.8322x2 

0.002159x2 

(4) + 4.83 

V (5) 

1.96(6) 

217.4 + (3) 

± (7) 

A^2* 





— 



-116.5 

42.32 

47.15 

6.866 

13.4 

114.3 87.5 

64 

-99.9 

31.09 

35.92 

5.993 

11.7 

129.2 105.8 

84 

-83.2 

21.59 

26.42 

5.140 

10.1 

144.3 124.1 

104 

-66.6 

13.82 

18.65 

4.319 

8.5 

159.3 142.3 

124 

-49.9 

7.77 

12.60 

3.550 

7.0 

174.5 160.5 

144 


189.7 178.5 

205.5 196.1 
221.7213.1 

238.7 229.3 

256.3 245.1 

274.3 260.3 

292.5 275.5 

310.7 290.5 
329.0 305.6 

347.3 320.5 


,. * A ' T2 7 + and ” 204 ‘ cf • Smith ’ J - G " and A * J- Duncan, Elementary Status- 
36 ° : The line °' regressi °“ plotted in Fig ' 113 is ba “ d >■»<>” 

Planes of Regression. The argument concerning sampling 
fluctuations of a line of regression can be extended easily to a 
plane of regression. Thus, if samples are selected so that the 
values of the independent variables X 2 , X s , . . . are the same 
for each set of samples 1 and only the values of X, are allowed 

1 Suppose, for example, that N = 3 and the first set of sample values is 

Dependent Variable Independent Variables 

U X. X, 

!o 2 3 5 

12 3 16 

14 , 


Independent Variables 

x 2 x 3 x 4 

2 3 5 

3 16 

4 0 7 


Then all other samples of 3 would have to be selected so that X 2 , X h and 
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to vary at random from sample to sample, then the mean value 
of X[ for an arbitrarily selected set of values of X 2 , X% 3 . . , 
will be 

X[ — ai .23 . . . + bl,23 . . . + b 13.2 . . . + * * ' (H) 

where x 2 X 2 - X 2 , x s = X s - X 3} . . . and X[ refers to the 
mean of the sample values of X[ for the selected values of X 2 , 
X 3 , .... The standard error of these sample values of X[ 
would be 

$X'l = <^1.23... + <* 2 Sl2.3 . . . X 2 + <* 2 bl3.2 . . . ^3 + * * * (12) 

The distribution of X\ will be normal, and this may be used to 
test hypotheses and determine confidence limits if di. 23 ... I s 
known or if it is estimated and the sample is large. If the 
sample is small, the t table must be used with n = N - m 
where m is the number of regression statistics in the regression 
equation. 

The limiting loci for a plane of regression are given by the 
sample plane plus the .05 value of t for n = N - m times the 
standard error of X[. The equations are 

X'i = ai.23... + 612 . 3 ... £2 + 613 . 2 ... £3 + * _ 

± £.06 V 7 .. . + 0T 2 foi2.3 . . . x \ + ... x l * * ■ (13) 

where £,05 is the .05 point of a £ table lor n — N — in. For 
large samples (that is, N - m> 30), ±1.96 may be substituted 
for £.05- 

SAMPLING DISTRIBUTION OF HIGHER-ORDER VARIANCES 

The sampling distribution of a higher-order variance is essen¬ 
tially the same as that of a zero-order variance, the form of the 
distribution being that of the x 2 distribution. More precisely, 

it may be said that the statistic — will vary from sample 

®1.23 ... 

to sample in the manner of the x 2 distribution whose n is equal 
to N — m, m being the number of regression statistics in the 
regression equation. Thus the only difference between the 
sampling distribution of a zero-order variance and that of a 

Z 4 had the values 2, 3, 5; 3, 1, 6 ; and 4, 0, 7, respectively, in each set of 
sample values. Only the values of the dependent variable X 1 would be 
allowed to vary at random from one sample set to another. 
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higher-order variance is in the value of n in the x 2 equation. 
In the first case, has a x 2 distribution with n = N — 1, 

and in the second case - - 1 ’ 23 - - has a x 2 distribution with 

° 1.23 . . . 

n = N — m. 

All the sampling analysis regarding zero-order variance may 
be applied directly to higher-order variances. For example, 
if the sample variance 4 A is (21.02) 2 = 441.84, and if there are 
10 cases in the sample, the hypothesis that the population 
variance ti\ A is 500, say, could be tested by looking up the value 

~^2 in a x 2 table with n = N — 2. For the given figures, 


this would give the result - ° ( ^ 8 ^ = 8.83. On the assump- 

tion that a coefficient of risk of .05 is adopted and that the 
region of rejection is taken all at the lower end of the distribution, 
it is found that for n = 10 - 2 = 8 the .95 point of the x 2 
distribution is 2.733. Since the sample Value is larger than 
this, the hypothesis would have to be accepted. 

If the sample is larger than 30, say, the normal curve may be 
used in place of the x 2 distribution. In this case, the value of 

(T 2 — d 2 

d 2 ^/2JN W0uld be determined and the result looked up in a 


normal table. For example, for the Mount Holyoke data the 
size of the sample is 81 and *l A = 441.84. To test the hypothe¬ 
sis that the population first-order variance is 490, say, calculate 

CT 2 1 do 1 1 * 

hor the glven data this has the value 


441.84 - 490 
490 VS 


Since this is numerically less than -1.645, the hypothesis must 
again be accepted. 

Confidence limits with a confidence coefficient of .96 can also 
- k e established in the same manner as before. For small samples 
Act 2 

set equal first to the upper and then to the lower .02 points 

of the x 2 distribution for which n = N - m, and solve for d 2 . 
This will give unbiased confidence limits. For example, if 
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N = 10 and (t\ a = 441.84, the upper and lower limits are given 1 


by <*i . 2 


10(441.84) 

2.032 


2,175 and di . 2 


10(441.84) 

18.168 


243. Lim¬ 


its with only an upper or lower bound can also be derived in a 
manner similar to that described for the zero-order case . 2 The 
work will not be duplicated here. For large samples, unbiased 


confidence limits are given by 


- d 2 


y/2/N 


+ 1.96. For 


ol 9. 


= 441.84 


and N == 81 as in the Mount Holyoke problem, these become 
^ ±1 96? which gives d 2 2 = 639 an d d 2 . 2 = 338. 

<*1. 2 vVt 

Since the sampling distribution of a higher-order variance 
is the same as that of a zero-order variance except that in its 
X 2 form n = N — rri instead of N - 1, it follows that the maxi¬ 
mum-likelihood estimate of the corresponding population 

higher-order variance is equal to ^ 3 — times the sample 

variance instead of — j' The proof of this is the same as that 

given in Chap. XI (pages 290 to 294). In the present instance, 
al 2 for the sample equals 441.84. Hence the maximum-likeli¬ 
hood estimate of the value of d ^ 2 is & 1.2 — -H (441.84) = 453.03. 


USE OF LINE OF REGRESSION AND 
HIGHER-ORDER STANDARD DEVIATION IN ESTIMATING 
THE DEPENDENT VARIABLE 

The principal use of a line or plane of regression and the 
corresponding higher-order standard deviation is to estimate,the 
dependent variable from the independent variables. For 
example, the sample data on Mount Holyoke grades show that a 
student’s grade in second-semester English (X 3 ) is on the average 
related to her first-semester English grade (X 2 ) by the equation 
(line of regression of Ii on I 2 ) 

X[ = 217.4 + .8322(X 2 - X 2 ) 

This equation may therefore be used to forecast a student’s 

1 Note that n — 10 — 2 = 8. 

2 See p. 289. 
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second-semester grade from her first-semester grade. For 
example, if a student gets a first-semester grade of 180, the best 
estimate that can be made of her second-semester grade will be 

X 7 = 217.4 + .8322(180 - 204.1) 

= 197.4 

But how reliable is this estimate? It is in answering this 
question that use is made of the first-order standard deviation 
around the foregoing line of regression. This first-order standard 
deviation was found to be <n. 2 = 19.53. If the foregoing equa¬ 
tion represented the population line of regression and the fore¬ 
going standard deviation was the population first-order standard 
deviation, and if the data were normally distributed, then it 
could be said that the chances are 95 out of 100 that the actual 
second-semester grade will fall within the limits 197.4 ± l.gGdj 2 , 
that is, between 197.4 + (1.96) (19.53) = 235.7 and 

197.4 - (1.96)(19.53) = 159.3. 

The foregoing line of regression and standard deviation are not 
those of the population, however, but were obtained from a 
previous sample of 81 grades. Hence additional allowance must 
be made for the sampling fluctuation in the line of regression 
and in the first-order standard deviation. The procedure may be 
outlined as follows: 

It should be noted that the difference between the estimate 
of the second-semester English grade and the actual value that 
occurs will consist of the sum of two independent deviations. 
First, there is the deviation of the sample regression line from the 
population regression line. This is the error that arises from 
taking the regression of the 81 sample cases as the population 
regression. Second, there is the deviation of the actual second- 
semester grade from the population regression value. This is 
the error that arises from taking the mean of a distribution as 
representative of any individual case. The latter will, of 
course, differ from the mean in accordance with the laws of 
sampling. There are thus two sampling errors involved in 
estimating a future grade, one relating to the past sample of 
81 grades, the other to the sampling of the future grade. The 
total error is the sum of these two. Symbolically 




SAMPLING FLUCTUATIONS IN REGRESSION STATISTICS 389 

; ^actual ~^Ltimated = (^actual ~ ^estimated) + Vtual ^Wual) ( 14 ) 

| Previous analysis indicated that sample regression values 
jx't tend to be normally distributed around the population X[ 
i as a mean with a standard deviation equal.to 


\ dx'i — 

j ■ _ ■ - 

' Also, on the assumption that the grades form a normal bivariate 

population, it can be said that the actual second-semester 
’ grades tend to be normally distributed about the population 
regression line with a standard deviation equal to the (population) 
first-order standard deviation. Since sample grades in the future 
are independent of sample grades in the past, with respect to 
deviations from the population regression line, the sampling 
variance of the first of the right-hand members of Eq. (14) is 
independent of the sampling variance of the second right-hand 
member. Hence the sampling variance of the left-hand member 
is the sum of these two independent sampling variances. 2 Thus 


‘ 1 2 act^al 


cr £ _ 

’■actual ’estimated 


= + < J ?.2 

= V + <*1.2 


(15) 


If the population first-order variance is not known and 
must be estimated from the sample value, then the ratio of 
X x —X[ to its estimated standard error will have a 

sampling distribution that is of the form of the l distribution, 
with n = N — 2. Hence to determine “confidence limits” 1 for 
the actual value of the second-semester English grade, set 


X laetual = -^estimated ± £.05 + * 1.2 


where* £.05 is the .05 value of a £ table for n = N - 2. For large 
samples, £. 0 5 can be replaced by 1.96, and the normal curve can 
be used. 

As already noted, = 4.83 and aHn — .002159. Further - 
more, x\ = (180 — 204.1) 2 = 580.81. Thus the confidence limits 


1 Confidence limits in the sense that in repeated sampling these limits will 
include the actual value 95 per cent of the time. They are not confidence 
limits in the usual use of the words since they do not refer to a population 
parameter. 

2 See pp. 419-421. 
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for the actual value of the second-semester English grade are 

= 197.4 + 1.96 ■%/4.83 + (.002159) (580.81) + 391.07 
= 197.4 + 1.96(19.93) = 236.5 

and 

Viewer = 197.4 - 1.96(19.93) = 158.3 

The chances are 95 out of 100 that the range 158.3-236.5 will 
include the actual second-semester grade for the student whose 
grade is being predicted. 

Since N is large, these limits for the actual value of X\ do not 
differ materially from those previously obtained on the assump¬ 
tion that the sample regression and first-order variance are the 
population regression and first-order variance. 1 For small 
samples, however, the allowance for sampling errors in the 
regression line and the first-order variance will make a much 
greater difference. 

Similar analyses will determine confidence limits for forecasts 
from a plane of regression. For small samples, the limits are 
given by 

X t = 

X[ + t. 05 ,.xl x\+ * • -< 7 ?. 23 _ (16) 

where t. 0 5 is the .05 point of a t table for n = N - m. If the 
samples are large, then 1.96 may be used in place of t . 05 . 

1 See p. 388. 



CHAPTER XVI 

PROBLEMS INVOLVING TWO SAMPLES 


Some of the more interesting problems in statistical analysis 
'involve two samples. A sample poll, for example, is taken 1 
month before election and another 2 days before election. The 
I former shows a Democratic vote of 54 per cent and a Republican 
] vote of 46 per cent; the latter a Democratic vote of 49 per cent 
* and a Republican vote of 51 per cent. If the samples both 
' number 100, can the difference in the two percentages be taken 
to represent a real change in sentiment or can such a difference 
: be reasonably attributed to the chance fluctuations of sampling? 
Again, 200 automobile tires of make A show an average mileage 
of 16,400 miles; 300 tires of make B, subjected to the same test, 
show an average mileage of 15,900 miles. Does the difference in 
! mileage indicate that make A is really better than make B, or 
■ can the difference be reasonably attributed to chance? It is 
with such problems as these that the present chapter will be 
' concerned. 

OUTLINE OF THE GENERAL ARGUMENT 

Proceeds from a Nidi Hypothesis . The statistical analysis by 
which it is determined whether the difference between two 
samples can reasonably be attributed to chance starts with a null 
hypothesis. The hypothesis is first set up that the populations 
from which the two samples are taken do not differ with respect 
to the characteristic in question. This is called the “null 
hypothesis.” Second, some statistic is computed from the two 
samples that is based upon the difference in characteristics being 
studied. This may be the difference between their mean values, 
the ratio of their two variances, or the like. Third, the sampling 
distribution of this statistic is derived on the basis of the null 
hypothesis. That is, the set of all possible pairs of samples from 
the assumed populations is derived, and the percentages of these 
pairs of samples having various values of the selected statistic 

391 
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are determined. Fourth, a special subset of the set of all possible 
pairs of samples is marked as an appropriate region of rejection. 
For example, this might be the upper 5 per cent of a normal 
curve. Finally, it is noted whether the statistic for the given 
pair of samples falls in the chosen region of rejection. If it does, 
the null hypothesis is rejected and the difference between the two 
samples is not attributed to chance. This general procedure will 
be explained more fully below with reference to several selected 
problems. 

Sampling Distribution of the Difference between Tivo Independent 
Sample Statistics . Before turning to particular problems, how¬ 
ever, two general relationships are worth noting. Often the 
statistic that is taken to measure the difference between two 
samples is the arithmetic difference between two individual 
sample statistics. Thus the statistic may be Zi - X 2 , <n — <r 2 , 
or the like. 

In general, let such a statistic be designated as 8, and let the 
two individual sample statistics be designated as 8 X and 0 2 , so 
that 6 = 0i — 0 2 . Now it is shown in the Appendix to this 
chapter that if the two samples are independent of each other, 
whatever the nature of the populations from which the two 
samples have been drawn, the mean of the sampling distribution 
of 8 is equal to the mean of the sampling distribution of $ 1} minus 
the mean of the sampling distribution of 0 2 , and the variance of 
the sampling distribution of 8 is equal to the variance of the 
sampling distribution of 8 X plus the variance of the sampling dis¬ 
tribution of 0 2 . In summary, if the two samples are independent, 


0 = ei - e 2 

<i 2 0 = 6 2 e 1 -j- d 2 e 2 

These general relationships are worth remembering. 



DIFFERENCE BETWEEN TWO SAMPLE PERCENTAGES 

A problem that often arises is whether the percentage of 
favorable cases in one sample is significantly different from the 
percentage of favorable cases in another sample. Consider, for 
example, a recent. senatorial campaign in Kentucky in which 
the two candidates for the Democratic nomination were Barkley 
and Chandler. On Apr. 10 a sample poll was taken which 
gave Barkley 67 per cent of the total vote and Chandler 33 per 
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cent. On July 8 another sample poll was taken which gave 
Barkley 64 per cent of the vote and Chandler 36 per cent. 1 

If the first sample numbered 100 votes and the second 200 
votes, was it to be inferred that, during the 3 months from April 
to July, public sentiment shifted from Barkley to Chandler, or 
could the difference between the two returns have been reasonably 
attributed to chance? This is the question that the following 

argument will seek to answer. 

The Argument. The Null Hypothesis. In accordance with 
the general procedure outlined above, the first step in the analysis 
is to set up a null hypothesis. In the present instance this would 
state that the sentiment of the voters as a whole was the same 
on July 8 as it was on Apr. 10 and that the difference in sample 
returns for these two dates was merely the chance result of, 
random sampling. The purpose of the following analysis is to 
see whether this hypothesis should be accepted or rejected. 

The Statistic . The second step in the analysis is to choose an 
appropriate statistic for measuring the difference between the 
two sample returns. The statistic usually selected is the differ¬ 
ence between the two sample percentages. Call this statistic 
p d , and let it be defined by p d = pi — Pi, where pi is the sample 
percentage in favor of Barkley on Apr. 10 and p[ is the sample 
percentage in favor of Barkley on July 8. 

The Sampling Distribution of p d . It was shown in Chap. IX 
that the percentage of favorable cases in a sample would vary 
from sample to sample in accordance with the binomial distribu¬ 
tion. The mean of the distribution was found to be equal to 
and the variance P 1 P 2 /-ZV, where pi is the percentage of 
favorable cases in the population and p 2 the percentage of 
'unfavorable cases. It was also pointed out that, if the sample 
|was large, the binomial distribution could be approximated by a 
■normal curve whose mean and variance were the same as those 
iof the binomial distribution. 

The same sort of result can be demonstrated in the present 
instance. 2 Thus it can be shown that the difference between the 
percentages of two large samples independently derived from the 
same population (the null hypothesis) is distributed approxi- 


1 Gallup, G., and S. F. Rae, “Is There a Bandwagon Vote?” The Public, 

Opinion Quarterly, Vol. 4, (1940), p. 245. _ PROPER T Y 1 

2 The mathematical analysis is beyond the °f&0T I Ty j£ 
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mately in the form of a normal frequency distribution with a 
mean of zero and a variance equal to the sum of the variances of 
the two individual sample percentages. That is, p d can be taken 
to be normally distributed with a mean of zero and a variance 

equal to ~j^jr "b where pi and p 2 are the percentages of 

favorable and unfavorable cases in the population from which the 
two samples are assumed to have been taken and N' and N" the 
number in each of the samples. 

It present instance, and p 2 are not known and must 

Ttherefore be estimated from the samples. 'This can be done by 
taking pi as the weighted average of p[ and pi' and by taking p 2 
as 1 minus the estimated value of pi. Thus pi and p 2 can be 
estimated from the equations 

- _ N’v[ + N"p[' 

P 1 jy' _j_ Nj" an d P 2 - 1 — pi (2) 


For the/iata given, this gives 
100(.67) + 200 (.64) 


Vi 


100 4- 200 


= .65 


and p 2 = 1 - pi = .35 


Hence, for the problem in hand, p d can be taken as normally dis¬ 
tributed with a mean of zero and a variance equal to 

~ (.65)(.35)^ + 200 ) = .003363 

The Region of Rejection . Let it be assumed that in instances 
of this kind the polling agency does not wish to reject the null 
hypothesis when it is true more than 5 times out of 100. This 
means that the region of rejection should be of such a size that the 
probability of a sample falling within it should be just equal 
to .05. 

Since a real shift in public sentiment away from Barkley and 
toward Chandler might ultimately lead to the election of the 
latter, whereas a shift tow T ard Barkley would merely strengthen 
his existing lead, it may be presumed that the polling agency 
would be more concerned about accepting the null hypothesis 
when the former shift in public sentiment had occurred than 
when the latter had taken place. On the assumption of such an 
attitude, it would appear that the upper .05 of the sampling 
distribution of p d would be the best region of rejection to adopt. 
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For the probability of a sample pa falling in this region would 
then be greatest if public sentiment had actually shifted toward 
Chandler. The region of rejection in the present instance will 
therefore be the upper .05 tail of a normal curve whose mean is 

zero and whose variance is .003363. 

Testing the Null Hypothesis . The final step is to test the given 
hypothesis by noting whether the given sample falls m the. 
selected region of rejection. Since this region comprises the 
upper .05 tail of the normal curve, it will include all values ot 
t/$ equal to or greater than 1.645. In the present instance, 

p d = 67 per cent - 64 per cent = 3 per cent, and 

d* = V-003363 = 5.8 per cent 

j 1J d • 

Hence, 

! v± = 3 per cent = 517 

d Pd 5.8 per cent 

The sample obviously does not fall in the region of rejection. 
The null hypothesis is thus accepted, and the difference in the 
sample polls is attributed to chance. . . , 

In conclusion, it should be noted once again that the foregoing 
analysis relates only to independent samples. If the same people 
were questioned on July 8 as those questioned in April, the sample 
results would not be independent of each other and the above 
analysis could not be applied. 

Alternative Argument. Instead of using the foregoing argu¬ 
ment it would have been possible to solve the given problem by 
a test of independence such as that described in Chap. XII . 
The steps in this alternative argument are as follows: 

First the results of the two polls are set up in a contingency 
table in which the votes arc classified according to candidates 
on the one hand and dates of polls on the other. This has been 
done in Table 45. In this form the problem can be stated as 


Tab lr 45.—Cross Classification of Poll Data 



Dale of poll 

Votes favoring 
Barkley 

Votes favoring 
Chandler 

Totgl 

votes 

A ^.11 

“ : ~ 

67 

33 ’ 

100 


128 

72 

200 

tuy .... ■ 

'TV^l-r.lc .. . . • ■ 

195 

105 

300 




■■■.. 
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follows: Is the classification according to candidates independent 
ot the classification according to dates, or does the division of 
votes vary significantly from April to July? 

The null hypothesis assumes that the true division of votes 
between the two candidates is the same in April and July and 
that the apparent differences are due to the chance effects of 
sampling. On the basis of this hypothesis, estimates are made 
of the true division of sentiment by taking a weighted average 
of the sample results for the 2 months. Thus the percentage in 
iavor of Barkley may be estimated as equal to 

100(.67) + 200(,64) _ 195 

300 300 “ 


and the 
equal to 


percentage in favor of Chandler may be estimated as 
100133) + 200(,36) _ 105 _ 

300 “ 300 — The total vote in 


each month is then distributed in the same proportion as these 
estimates of the hypothetical division of sentiment. The results 
are shown in Table 46. The question then is: Do the actual 
results, shown m Table 45, differ significantly from the expected 
results shown m Table 46? 


Table 46.— Poll Results That Would Be Expected on the 
Assumption of Independence 


Date of poll 

Expected 
votes favoring 
Barkley 

Expected 
votes favoring 
Chandler 

Total 

votes 

April. 

65 

130 

35 

70 

100 

200 

July. 

Totals. 

195 

105 

300 ~ 






As pointed out m Chap. XIII, questions of this kind can be 
answered by calculation of the statistic 


2 (actual number in each cel l minus expected number) 2 

expected number ' (3) 

and making use of the fact that this statistic has a sampling dis¬ 
tribution of the form of a x 2 distribution. The degrees of free¬ 
dom it is noted, will be equal to (r - 1 )(c - 1), where r is the 
number of rows m the contingency table and e the number of 
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solumns. Since r and c both equal 2 in the present problem it 
follows that statistic (3) for this problem will be distributed 
like with, n = (1)(1) = 1. 

j It remains now to determine what part of the sampling distri¬ 
bution will serve as the best region of rejection. In the previous 
‘Argument the upper .05 tail of the normal curve (the sampling 
distribution there used) was considered as the best region to 
employ. For it was assumed that the polling agency would be 
especially anxious not to accept the null hypothesis if the actual 
trend in public sentiment was away from Barkley and toward 
Chandler. The same assumption will be made here. 

In view of this assumption it might appear at first glance that 
the upper .05 tail of the sampling distribution would again be 
the most appropriate region of rejection to employ, but this is 
aot so. Statistic (3) does not take account of signs, and large 
ealues might arise from sample differences in favor of Barkley 
just as readily as from sample differences in favor of Chandler. 
If the region of rejection in this case is to be comparable with the . 
upper .05 tail of the normal curve used in the previous argument, 
it should include only those values of the statistic that arise from 
differences in favor of Chandler. Such differences, on the basis 
of the null hypothesis, will occur only 50 per cent of the time. 
Hence, s .05 region of rejection comparable to that of the previous 
Argument may be taken to constitute all values of statistic (3) 
that arise from differences in favor of Chandler and that at the 
same time are equal to or greater than the upper .10 point of the 
X 2 distribution (n — 1), that is, equal to or greater than 2.706. 

The problem can now be solved. For the given data the value 
of statistic (3) is as follows: 

(67 - 65) 2 (33 - 35) 2 (128 - 130) 2 (72 - 70) 2 

65 35 ^ 130 + . 70 “ *^ b4 

Since this is less than 2.706, the sample does not fall in the 
selected region of rejection and the null hypothesis is again 
accepted. 1 As before, the difference between the two samples is 
attributed to the chance effects of sampling. 

1 It is interesting to compare probabilities in this problem. By the first 
method x/d = .517, and the probability of as great or greater value in either 
direction ( + or —) is approximately .60. By the second method the value 
of x 2 is .264, and the probability of as great or greater value is again roughly 
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DIFFERENCE BETWEEN TWO INDEPENDENT SAMPLE MEANS 

When the Populations Have a Common Known Variance. 

The Problem. The foregoing section was concerned with the 
difference between two samples fiom a discrete population. 
Similar problems arise in the case of continuous data. A problem 
that often occurs is whether the means of two independent 
samples ate significantly different. This problem is attacked in 
different ways, depending on the conditions involved. In this 
section it will be assumed that the variances of the populations 
from which the two samples have been drawn are the same and 
that this common value is known. The question to be tested 
will be: Are the means of the populations also the same? Only 
normal populations will be considered here; nonnormal cases 
will be discussed in Chap. XVIII. 

Some years ago the New York Sun published employment 
data for a large number of industrial companies. Data were 
given for each company for the years 1929 and 1935. A random 
sample of 10* of the smaller companies was taken from both these 
2 years; the mean of the first was found to be 89.1 men, and the 
mean of the second 123.5 men. A different set of companies was 
taken for each year so as to make the samples independent of 
each other. If the standard deviations of the populations are 
known in this instance to be both 100 and if the populations are 
taken to be normal, can it be inferred from these two samples 
that industrial employment among small companies was in 
general larger in 1935 than in 1929? This is the problem that 
the following argument will seek to solve. 

The Null Hypothesis. To determine whether the means of 
the two samples are significantly different, it will first be assumed 
that they are not different, and then the consequences of this 
assumption will be compared with the actual results. The null 
hypothesis will thus be that the two samples are from identical 
normal populations with standard deviations of 100. 


.60. In fact Vx^ is > f° r n " 1; distributed like double the upper half of a 
normal curve, and in the given problem V-264 equals approximately .514, 
which is the same as .517, except for errors arising from the use of decimals. 

* Usually a larger sample than this would be taken. A small sample is 
taken here to show that the analysis is applicable to both large and small 
samples. 
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The Statistic. The statistic that is generally selected in 
problems of this sort is the difference between the sample means, 
d the mean employment of the 10 companies in 1929 is indicated 
is Xi and the^mean employment of the 10 companies in 1935 is 
ndicated as X 2 , then the selected statistic is defined as 

X 1-2 = Xi - X 2 

; The Sampling Distribution of Xi_ 2 . It can be shown by mathe- 
natical analysis 1 that, when the population from which the two 
amples are independently drawn are normal populations, then the 
ampling distribution of Xi_ 2 is also normal, the mean of the 
listribution being zero and its variance being equal to the identi- 

al variance of the two populations multiplied by + —• 

.’hat is, if all possible pairs of samples of 10 each w T ere selected 
rom the assumed populations and the differences between the 
aeans of these samples were computed, it would be found that 
he differences would form a normal frequency distribution with 

i mean of zero and a variance equal to d 2 Since in 

he given problem d is 100 and Ni = X 2 = 10, it follows that 
ti_ 2 has a sampling distribution whose mean is zero and wdiose 
ariance is (100) ^xV + to) = 2,000. 

The Region of Rejection. Let the risk of rejecting the null 
ypothesis wdien it is true be placed at .05. This means that the 
otal region of rejection should be of this size. Furthermore, 
here appears to be no special reason in this instance to put all 
he region of rejection at one end of the sampling distribution, 
’or, in the absence of any special motive in making the statistical 
ivestigation, it would appear to be as bad an error to accept 
he null hypothesis when in fact employment was greater in 1929 
han in 1935 as it would be to accept the null hypothesis when in 
:ict employment was greater in 1935 than in 1929. On this 
asis the total region of rejection will be split up equally so as to 
lclude the .025 tails at each end of the sampling distribution, 
’he points marking these two tails, it will be recalled, are 
/d = +1.96. 

1 The analysis is essentially the same as that showing that the mean of a 
Ample itself is normally distributed ( cf. Chap. X). 
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The Test of the Null Hypothesis. The difference betweei 
the sample means is 89.1 — 123.5 = —34.4. The variance of th< 
sampling distribution of Xi_ 2 was computed to be 2,000. Th< 
standard deviation is the square root of this, or 44.7. The rath 
of the difference between the two means to its standard deviatioi 
is thus —34.4/44.7 = —.77, which is well within the 1.96 poin 
marking the region of rejection. Accordingly, there is no basif 
for rejecting the null hypothesis, and the difference between the 
two sample means can presumably be attributed to chance. 

When the Populations Have a Common but Unknown Vari¬ 
ance. In some problems it may be assumed that the variance* 
of the populations from which the two samples have been taker 
are identical, but the value of this common variance may not b< 
known. In this instance the population variance must b< 
estimated from the samples. For small samples, this require* 
some modification in the foregoing analysis. 

When two samples are taken from normal populations witl 
identical variances, the maximum-likelihood estimate of thi* 
common variance based on the sampling variation in the varianci 
that is independent of the difference between the sample mean* 
is 1 

iViff? + AVI ... 

Ni+ A 2 - 2 v ' 

where erf and <r\ are the two sample variances and N i and N 2 are 
the number in the samples. Except for the — 2, this is merely £ 
weighted mean of the two sample variances. The reason for the 
division by Ni + W 2 — 2 instead of Ni + iV 2 lies in the fact thai 
each of the sample variances is measured from its own mear 
instead of the true population mean. Since the sample mear 
itself varies from sample to sample, this reduces somewhat the 
average value of the sample variances as compared with the true 
population variance. To correct for this bias in both the sample 
variances, the factor JVi + iV 2 — 2 is substituted for Ni + Ah 
The value of <r 2 is said in this instance to be an estimate of 6‘ 
based on Ni + Ah — 2 degrees of freedom. 

1 The probability of getting the two samples can be broken up into twe 
parts, one depending only on the two sample means, the other only on tht 
two sample variances. When the common population variance is choser 
so as to maximize this second part of the total probability, its value is founc 
to be that given in the text. 
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Let the foregoing estimate of the population variance be multi- 

)lied by tt - ~b iCf~ * an d take the ratio of the difference between 
iV 1 iV 2 

he two sample means to the square root of this quantity. The 
esulting ratio, which may be written 


/T7X 

w i w 


s similar to the - that was found in the previous problem to be 

iistributed in accordance with the standard normal curve. The 
>nly difference is that the estimated rather than the actual 
lopulation variance is now used. This difference, however, 

auses the statistic -. 1 . -■ to have a sampling 

v //-( / “AT \ I / 1 / “AT \ r O 


cr Va/Ah) + (l/Ni) * & 

iistribution that is nonnormal for small samples. Actually, 
he sampling distribution of this statistic is of the form of the 
distribution, with n in the t formula, i.e., the degrees of freedom, 
qual to Ni + A 2 — 2. 

Since the t distribution approaches the normal curve when n 
3 large (say greater than 30), the use of the estimated value of 
he population variance leads to no change in the analysis if the 
amples are large. 1 When the samples are small, however, i.e., 
r hen Ni + N 2 — 2 is less than 30, it is better to use the t distri- 
ution in place of the normal distribution for testing the given 
ypothesis. This is the only fundamental change required in 
he previous analysis. 

To illustrate the estimation of the common population varia- 
ion, consider once again the data on industrial employment in 
929 and 1935. Let it be assumed that the variance in employ- 
lent of small industrial companies is the same in both years but 


1 The t distribution is leptokurtic, and its variance equals ——-=• Hence 
better approximation can be obtained for values of n between 30 and lOO by 
mltiplying the statistic (j?i — X 2 )/a VL + 7V 2 by ~~ before looking 
p the value in the normal table. Tt will be noted that here 


A, + A 2 - 2. 
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that its value is not known and must therefore be estimated fron 
the samples. Suppose that the variance of the 1929 sample if 
9,834.5 and the variance of the 1935 sample is 18,832.1. Ai 
stated above, the maximum-likelihood estimate of the commoi 
variance is found as follows, 

_ 10(9,834.5) + 10(18,832.1) > 

a 10 + 10-2 ’ 

The square root of this is 126.2, and the statistic 

Vi - X 2 

„ /J_ , _1 

V/V, + Ni 

has the value 

89.1 - 123.5 

. ———==== = .64 

126.2 vVo - T - to 

For n = 10 + 10 — 2 = 18, the .025 points of the t distributioi 
are ± 2.101. Values of t numerically greater than 2.101 will thu 
constitute a symmetrical .05 region of rejection. It is clear tha 
the sample value of t does not in this instance fall in the region o 
rejection, and again the null hypothesis is not rejected. A, 
before, the difference between the two sample means is appar 
ently due to chance. 

If the samples had numbered 50 cases each instead of 10, th< 
value of 

Xi - X 2 

* yjwx + ni 

would have been 

89.1 - 123.5 , lp 

—--t== = 1.46 

118 V 7 sV + TO 


Since the samples are large, the normal curve can in this cas< 
be used instead of the t curve, even though the population vari 
ance has been estimated.. The region of rejection will therefor 
constitute values of x/d numerically greater than 1.96. Th 
sample value of x/d is 1.46, and this again fails to fall in the regioi 
of rejection. The null hypothesis continues to be accepted evei 
though the samples are now larger. 
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When the Populations Do Not Have the Same Variance. If it 

•annot be assumed that the populations from which the two 
samples have been drawn have identical variances, only a rough 
,est can be employed to determine whether the populations might 
lave the same means. If the samples are fairly large (say 100 . 
>r more), it can be assumed with a fair degree of accuracy that ‘ 
he sampling distribution of the difference between two sample 
neans is a normal distribution with a mean of zero and a variance 
hat is approximately equal to the variance of sample 1 divided 
)y JVi plus the variance of sample 2 divided by IV 2 . 

That is, for large samples, <r*x-x 2 can be taken as equal to 
2 2 ^ - r; ' 
JT + and the ratio 
Vh N 2 

| ■ (6) 

ian be treated as if it were normally distributed. Except for the 
lifferent method of estimating <?x-x 2 , procedure is essentially 
he same as that outlined in the foregoing sections, and illustia- 
ions will be omitted. 

CORRELATED SAMPLES 

The Problem. If in the foregoing problem the same companies 
tad been taken in 1935 as were taken in 1929, the two samples 
vould not have been independent and the foregoing analysis 
ould not have been validly applied. Data often occur in this 
orm. A group of students, for example, may be tutored for one 
xamination and not tutored for another. Again, a set of hogs 
gay be fed one diet for one month and the same set of hogs fed 
nother diet for another month. How in these cases is a statis- 
ical test to be made of the effect of tutoring on students’ grades 
t the effect of diet on the ^ite of growth of hogs? It is these 
[uestions that the following analysis seeks to answer. 

, Testing Individual Differences. When two samples relate to 
he same set of individuals, the simplest method of analysis is to 
ake the difference between the two results for each individual, 
f each sample numbers N cases, this process will give a set of N 
idividual differences. If the two sets of data are not really 
Afferent, the whole population of individual differences will have a 
lean of zero. Sample sets of differences will, of course, have 
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means that are not zero, but these sample means will tend to be 
distributed around this population mean of zero. 

. If 62 is the variance of the whole population of differences, the 
distribution of sample mean differences will have a variance o 

(j 2 

N* T ^ is distribution will be normal in form, so that, if the 


variance of the population of differences is known, the mean o: 
any particular sample of differences can be tested by taking the 

ratio of this mean value to and looking up the result in g 
normal frequency table. 

If the variance of the population of differences is not known, il 
must be estimated from the given sample. The maximum-likeli 
hood estimate of, the population variance that can be madt 


independently of the sample mean is <f 2 = — ^ — o- 2 ? w h e r< 
<r 2 is the variance of the sample of differences. The ratio of th< 


sample mean difference to 


gives a statistic the sampling 


distribution of which is of the form of the t distribution with n, th< 
degrees of freedom, equal to N - 1. When the variance of th< 
population of differences is not known, therefore, but must b< 
estimated from the sample, the t distribution is used to test th< 
significance of the sample mean difference. Of course, if N ii 
large, the normal curve can be used as an approximation to th< 
t curve. Illustrations of these procedures follow. 

Illustrations . Employment figures for 10 small industria 
companies in 1929 and 1935 are given in Table 47. This als< 
shows the individual differences for the 2 years. The mean o 
these differences is 10.5, and the question is: Does this sampt 
mean difference differ significantly from zero, or can its positiv. 
value be reasonably attributed to cfeance? 

First suppose that the standard deviation of the whole popula 
tion of differences is known to be 50. Then the standard devia 
tion ofthe distribution of sample means of differences is equal t< 
50/-x/lO = 15.8, and the ratio of the given sample mean to thi 
standard error is 10.5/15.8 = .66. This ratio, as pointed ou 
above, is distributed in accordance with the standard norma 


curve. If the region of rejection is taken as the values of x/ 
that are numerically greater than 1.96 (a symmetrical regioi 
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appears to be justified here), then this sample x/d does not fal, 
in the region of rejection and the null hypothesis is not rejected 
That is, the sample mean difference is not significantly diffeien 
from zero, and the apparent difference in employment may 
reasonably be attributed to the chance effects of sampling. 

Table 47.-In dividual DnrmfflNOE in Employment op 10 Industeial 
Companies, 1929 and 1935 


Number employed 
1929 I 1035 


110 

166 

130 

95 

36 

188 

85 

185 

201 

144 


48 

93 

120 

85 

52 

240 

135 

130 

217 

115 


Difference in number 
employed, 1929-1935 


62 

73 

10 

10 

-16 

-52 

-50 

55 

-16 

29 


+ 105 


Differences squared 


3,844 

5,329 

100 

100 

256 

2,704 

2,500 

3,025 

256 

841 


18,955 


If the population standard deviation is not known, as is gen¬ 
erally the case, then it must be estimated from the sample. In 
the present instance, the standard deviation of the sample 
differences is equal to 42.25. * The best estimate, therefore, that 
can be made of the standard deviation of population diffeient.es 1 , 

= V-y (42 .25) _ 14 og) anc [ 

Va Vio 

.75. This last quantity, as noted above, is 


(42.25); and this gives 
X d 10.5 


a/\/N 14.08 

distributed like t. Since for n = 9 the .025 points of the t dis- 


* This is calculated by use of the short formula 


tr 2 = 


which gives 


SX 2 

N 


18,955 

10 


- (10.5) 2 = 1,785.25 
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mfv 1 1 r ’ V 68 ° f 1 n » m eric a lly greater than 2.262 

may be taken as a symmetrical region of rejection with the coeffi- 

not l'll nS ^, equal to - 05 ; The sam P le * = .75 and obviously does 

Setn m W S regl ?u rejecti0n - The nuI1 Vpothesis that the 

wT be T ^ e i tW0 samples is due t0 ohance cannot there- 
ore be rejected, and the assumption that there is no real differ¬ 
ence m employment in the 2 years continues to be a reasonable 


If there had been 50 companies instead of 10, the ratio 


could have been treated as if it were normally distributed^ 
example, if the mean and standard deviation of a sample of 50 
equaled 10.5 and 42.25, then a would have the value VE (42.25), 


* _ VE (42.25) „ 

Vn vlo ~ 6 - 03 - The ratio 


x. 


a/\/N 


would then equal 


10.5 

6.03 


= 1.74. 


This is less than 1.96, so that, if 


the two .025 tails of the normal curve are taken as the approxi- 

iTlVuldTe' the nUU hyP ° the8is WOuId be accepted. 

t would be concluded once again that employment in 1929 
among sma 1 firms, was not really greater than employment 
among small firms m 1935. p y 

n is interesting to note at this point that, when X t and X 2 are 

vlrianct of ? T ° f ^ differences V “ X, is equal to the 
vanance of X, plus the variance of X 2 minus twice the product 

Symbol “ dCTla ^ons by the correlation between X, and X,. 


' X1-X2 


Xi 


+ <T 


X2 


2cr 


Xx<? xJf'XiXi 


(7) 


Hepce it follows that, in problems of correlated samples the 
variance of the differences is less if the correlation betweek Xr 

fslhft dile reater ' ^ Pra f Cal importance this conclusion 

detect k fl, 8 + Centl i a tendencies can b c more readily 
I ie correlation between individual members of the 
samples is increased. e 

difference between two sample variances 

Another problem that often arises in statistical analysis is to 
determine whether two samples have come from populations with 
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different variances. To refer to a previous example, it might be 
claimed that a new process for the manufacture of electric-light 
bulbs will reduce the variability in length of life of the bulbs. 
To check this claim two lots of bulbs could be produced, one by 
the old process, the other by the new, and the difference in 
variability could be subjected to a statistical test. Again, two 
different brands of razor blades might be tested as to their 
sharpness, and the variability in the two brands might be com¬ 
pared. It is with such problems that the following sections are 
concerned. As before, it will always be assumed that the 
populations are normal; for nonnormal cases the reader is referred 
to Chap. XVIII. 

Testing a Difference in One Direction. The method of testing 
the difference between two sample variances varies to some 
extent with the particular problem involved. If the problem is 
concerned with testing a difference in one direction only, one 
method is applicable; if it is concerned with a difference in either 
direction, another method is required. The present x section will 
deal with the testing of a difference in one direction only. 

The Specific Problem. To keep the discussion concrete, con¬ 
sider once again the data on industrial employment among small 
companies. Suppose it is claimed that, owing to the instability 
introduced by the depression, there was a greater variability in 
the size of companies in 1935 than in 1929, size of companies 
being measured by amount of employment. The variance in 
employment among the ten 1929 companies, it will be recalled, 
was 9,834.5, and the variance among the ten 1935 companies was 
18,832.1, which would seem offhand to substantiate this claim. 
The immediate statistical problem is to determine whether this 
apparent difference in variability is great enough to be attributed 
to some specific causal factor such as the depression or whether 
such a difference could reasonably be attributed to the chance 
effects of sampling. 

The Null Hypothesis. In carrying out this statistical test the 
first step is to set up the null hypothesis that the difference in 
variance is due, not to some specific causal factor, but only to 
chance. In other words, the hypothesis states that the 1929 and 
1935 populations have the same variance. It will be noted that 
it does not state that the populations are the same, for nothing 
is said about the means of the populations. The following test is 
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therefore independent of whether the means of the populations 
are the same or not. It thus differs from tests of the difference 
between two means, which are based upon the assumption that 
the variances are the same. 

It should be noted also that the hypothesis assumes that the 
samples are independent of each other. If the data on employ¬ 
ment pertain to the same companies in both years, then the 
samples would not be independent of each other and the following 
analysis could not be validly used. 

The Statistic. The statistic that is most convenient to use in the 
present instance is the ratio of the maximum-likelihood estimates 
of the population variances in the 2 years. The maximum-like¬ 
lihood estimate of the population variance made independ- 

—and the maximum-likelihood 


ently from the first sample 1 is y 


estimate of the population variance made independently from the 
second sample is where a\ and erf are the two sample 

variances and Mi and M 2 are the number of cases in each sample. 

The statistic that is used for this problem is the ratio of these 
two maximum-likelihood estimates, i.e., 

a\N 1 /(N 1 - 1) 

<rtN 2 /(N 2 - 1) W 

If the two populations have identical variances, these two esti¬ 
mates will be approximately equal and the above statistic will be 
close to 1 . The statistical problem is to determine whether it 
differs from unity by an amount greater than can reasonably be 
attributed to chance. 

The Sampling Distribution of the Ratio of Two Maximum-likeli¬ 
hood Estimates of Variance. The reason why the ratio of the two 
estimates of variance is used rather than their arithmetic differ¬ 
ence is that the sampling distribution of the former can more 
readily be determined. Mathematical analysis shows that this 
sampling distribution is of the form of the F distribution with 
ni and n 2 of the F equation equal to Mi — 1 and M 2 — 1, respec¬ 
tively. As in other cases, n\ is the degrees of freedom involved 
in estimating <71 and n 2 the degrees of freedom involved in esti¬ 
mating cr 2 . 

1 That is, independently of the mean. 
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That is, if pairs of samples are drawn independently from popu¬ 
lations with identical variances and if the ratio of the maximum- 
likelihood estimates of variance is calculated for each pair, these 
ratios will have a frequency distribution that is of the form of the 
F distribution, with m = N i - 1 and ?i 2 = N 2 “1- It might 
be well for the reader to turn back at this point to Chap. Vi and 
reread the section there on the F distribution. It is also well to 
note again that this sampling distribution is valid only for ratios 
that are computed from independent samples. 

The Region of Rejection . The present problem is concerned 
with whether the 1935 variance is greater than the 1929 variance. 
If. the statistic selected is the ratio of the 1935 variance to the 
1929 variance, then the best region of rejection to employ is the 
upper tail of the F distribution. For it has been shown that, if 
the presumably larger variance is actually the larger, there will 
be more chance of rejecting the null hypothesis if the upper tail 
of the F distribution is used than if any other region is adopted. 
Of course, it would be possible to take the ratio of the presumably 
smaller variance to the presumably larger one, and m this case 
the lower tail of the F distribution would be the more appropriate 
one to employ. This alternative course is not followed, however, 
for the tables of the F distribution are computed only for the 
upper tail and, as noted, the distribution is not symmetrical In 
the present instance the upper tail of the F distribution, will be 
employed as the region of rejection, and, as usual, the size of this 
region will be taken as .05. 

For the given problem, the maximum-likelihood estimate of the 
. . (10) (18,832.1) 

population variance based upon the 1935 sample is g > 

and the maximum-likelihood estimate of the population variance 

based upon the 1929 sample is The ratio of the 

first to the second is 1.915. In this problem, this is the same as 
the ratio of the two sample variances themselves, since the 
samples are of the same size. In another problem in which the 
samples are of different sizes, this equality would not 
exist. 

i Cf. Neyman, J., and E. Pearson, “On the Problem of the Most Efficient 
Test of Statistical Hypotheses/; Philosophical Transactions of the Royal 
Society of London, Series A, Vol. 231 (1933), pp. 289-337. 
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Testing the Hypothesis. To test the given hypothesis it is 
necessary merely to note whether the above ratio is greater or less 
than the .05 point of the F distribution for which m = 9 and 
” 2 = 9.* The tables, unfortunately, do not give the .05 point 
of this particular F distribution, so that its value must be inter¬ 
polated. For rii = 8, n 2 = 9, the .05 point is F = 5.467; for 
ni = 12 > re2 = e > the -05 point is 5.111. Therefore, for the F 
curve for which »i = 9 and n 2 = 9, the .05 point must lie between 
these two values. Whatever its exact value, it is clear that the 
sample ratio of 1.915 is well within the area of acceptance, so that 
the sample value does not fall in the region of rejection, which has 
been taken to consist of all values of F equal to or greater than 
the .05 value. The null hypothesis is thus not rejected in the 
present problem, and the difference in sample variance is to be 
attributed to chance. 

If neither ni nor » 2 have values that are given directly in the F 
table, it would be necessary to interpolate in both directions. 
For example, if ni = 9 and n« = 40 for the given samples, then 
it would be necessary to find the following .05 values;. 


For 

The .05 point is 

n x = 8 and n 2 = 30 

3.173 

n x - 12 and n 2 = 30 

2.843 

= 8 and n 2 = 60 

2.823 

n x = 12 and n 2 = 60 I 

. 1 

2.496 


It is obvious that the .05 point for n t = 9, n 2 = 40 lies some¬ 
where between 3.173 and 2.496. If the given sample ratio lies 
beyond 3.173, it will certainly lie beyond the .05 value for m = 9 
and = 40. Similarly, if the sample ratio lies inside 2.496 it 
certainly lies within the .05 point for ni = 9 and n 2 = 40. 

In such cases there is no need for exact interpolation; but if the 
sample lies between 3.173 and 2.496, then the .05 point for m = 9 
and m = 40, may be roughly obtained by straight-line interpola- 


It will be noted that n x = N x - 1 and that N x refers to the size of the 
sample whose variance is put on top of the fraction expressing the ratio 
m this case to the 1935 sample. Thus N, refers to the sample that has the 
presumably larger of the two estimates of variance. Likewise, n 2 — N 2 — 1 
where N 2 refers to the size of the sample whose variance is put'in the denomi¬ 
nator of the fraction. 
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tion. 1 First interpolate for n 2 = 40. Since the .05 point for 
ni = 8, n 2 = 30 is 3.173 and the .05 point for n x — 8, n* — 60 is 
2.843 and since 40 is distant from 30 by of the distance between 
30 and 60, the .05 point for n-i = 8 ,n 2 = 40 will be approximately 
equal to 3.173 - (WX3.173 - 2.843) = 3.063. Likewise, the 
.05 point for ni = 12, n 2 == 40 will be approximately equal 
to 2.823 - (jf) (2.823 - 2.496) - 2.714. Values for ‘w, = 8, 
n 2 = 40 and n\ — 12, n 2 = 40 having been obtained, it remains 
to interpolate for n\ — 9. The .05 point for n\ — 9, n 2 = 40 
will thus be approximately equal to 

3.063 - (i) (3.063 - 2.714) = 2 976. 

This is the .05 point desired. As noted above, straight-line 
interpolation is not perfectly accurate. Consequently, if a 
sample value falls close to the interpolated value, it is well to 
forego any definite conclusion. In such an event it is better to 
judge the result a borderline case. 

Testing a Difference in Either Direction. The Problem. The 
foregoing test of difference between two sample variances was 
based upon the assumption that the investigator was interested 
in a significant difference in one direction only. It was on this 
basis, it will be recalled, that the upper tail of the F distribution 
was selected as the region of rejection. There may be cases, how¬ 
ever, in which the investigator is indifferent as to whether any 
difference that might exist is in one direction or the other. This 
is the problem that will now be considered. 

The Test. It might be thought offhand that, when the investi¬ 
gator is indifferent as to the way in which the two variances might 
differ, a satisfactory test could be devised by distributing the 
region of rejection equally between the two tails of the F distribu¬ 
tion. This is not true; for such a test is biased in that the prob¬ 
ability of accepting the null hypothesis in such an instance may 
be greater in some cases in which the null hypothesis is not true 
than when it is true. The test described below avoids this 

1 Better results may possibly be obtained by taking the variable 

riiF 

u -- 

UiF + 7l‘Z 

and using tables of the incomplete beta function. Cf. Rider, Paul R., 
Introduction to Modern Statistical Methods (1939), p. 119. 
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difficulty. It is unbiased in that the probability of accepting the 
null hypothesis when it is true is greater than for any instance 
when it is not true and in that the probability of rejecting the null 
hypothesis when it is true is less than for any instance when it is 
not true. 

Let two samples be taken from normal populations. Let the 
variance of one sample be a\ and the variance of the other sample 
be (jfj. The investigator, it will be presumed, is interested in 
testing the null hypothesis that the two population variances are 
the same, and he is indifferent as to whether any possible differ¬ 
ence between them is in one direction or the other. On these 
assumptions, the best statistic to employ is 

T __ ( n i + n %) log e — ni log e a\ — n 2 log e /f ^ 

L -TT^ w 


in which ni = Ni — 1, n% = AT — 1, <r $ 


N*r\ + iWt 


J-iJ. VVJLJLJL^X-L I VI - •'■'Z -*■ » Z jy _|_ jy _ 2 (that is, 

cr 2 is the maximum-likelihood estimate of the population vari¬ 
ance based upon the two sample variances taken together), 

a\ _ an q v .2 __^ 2<T2 (the maximum-likelihood esti- 

N i — 1 -/V 2 1 

mates of the population variance based upon the two sample 
variances separately), and 


3 \JVi — 1 ' V 2 — 1 


i_\ 

Ns - 2 / 


The statistic shown in Eq. (9) has been found to have a sam¬ 
pling distribution that is approximately of the form of the % 2 
distribution with the degrees of freedom n equal to 1. It leads 
to an unbiased test if the upper tail of the distribution is taken 
as the region of rejection. 1 

An Example . To illustrate this test, consider again the vari¬ 
ance in employment among small industrial companies in 1929 
and 1935. The variance of a sample of 10 companies from the 
first year, it will be recalled, was 9,834.5 and the variance of a 
sample of 10 companies from the second year was 18,832.1. The 
question is: Do these two sample variances differ sufficiently to 


1 Cf. Pitman, E. J. G., “Test of Hypotheses concerning Location and 
Scale Parameters,” Biometrika, Vol. 31 (1939), pp. 200-215. The L used 
above is equal to Pitman’s 2 L and is not the same as his L. 
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indicate that the population variances are not the same? This 
will be answered by setting up the null hypothesis that the 
populations have the same variance and then determining 
whether this hypothesis can reasonably be accepted on the basis 
of the sample results. Tn making this test it will be presumed that 
the investigator is indifferent as to whether any possible differ¬ 
ence between the variances is in one direction or the other. 

The quantities required for the calculation of the statistic L 
are 

+ 2 10(9,834.5) + 10(18,832.1) 


log, 

log, 

log. 


10 + 10-2 
2.30259 logic 15,926 = 
N«rl 


log, 

log* 


N i - 

Nia I 


= 2.30259 log 


# 2-1 


= 2.30259 log 


-G)( 


I + i 

9^9 


1 




= 15,926 

9.675730 

10(9,834.5) 
9 

10(18,832.1) 
9 

18 


10 


9.299017 


= 9.948582 


Numerator of L 


9 + 9 

(9 + 9) (9.675730) - [9(92.299017) 

+ 9(9.948583)] 

= 174.163140 - 173.228391 
= .934749 

and L equals this number divided by 1 + a, that is, 

.934749 


it 


= .88555 


The coefficient of risk will be taken equal to .05 as previously, 
and this region of rejection will be the upper .05 tail of the x 2 
distribution for n = 1. The table of the x 2 distribution in the 
Appendix (Table VIII) shows that the lower limit of this region 
is 3.841. Since the value of L is well below this limit, the null 
hypothesis is not rejected and it is concluded that the two 
samples may reasonably have come from populations with the 
same variance. 


ARE TWO SAMPLES FROM THE SAME POPULATION? 

The Problem. The problems so far considered have been con¬ 
cerned with whether the normal populations from which two 
samples have been drawn differ with respect to a single charac- 
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teristic, for example, with respect to their means or with respect 
to their variances. The questions posed were: (1) On the 
assumption that the variances of the populations are the same, 
are the means also the same? (2) Without any assumption 
regarding the means of the populations, are their variances the 
same? The question now to be raised is: Are both the means 
and variances the same? This question differs from (1) in that 
there the variances were assumed to be the same, whereas in the 
new question the equality of the variances is part of the hypoth¬ 
esis to be tested. 

For example, suppose a manufacturer of automobile tires is 
comparing two processes. If he knows or has good reason to 
believe that the variability in mileage is the same for tires manu¬ 
factured by one process as for those manufactured by the other, 
and if he is interested only in a possible difference in average 
mileage, he will use one of the procedures above for testing 
the difference between means. If he does not care about a possible 
difference in average mileage but is interested only in a possible 
difference in variability, he will use one of the procedures for 
testing the difference between variances. Finally, if he is 
interested in whether the tires manufactured by the tw;o proc¬ 
esses differ either with respect to average mileage or with respect 
to variability, or with respect to both, he will employ the test 
outlined below. 

The Test. When the populations are normal, the best test 
that can be made of joint equality of means and variances appears 
to be offered by the statistic 1 



in which <r 0 is the standard deviation of the two samples treated 
as a single sample and <ji and cr 2 are the two individual sample 
standard deviations. 2 The value of a\ and hence of o- 0 , may be 
found by throwing the two samples together and calculating the 

1 The symbol X# is used in the original article and is continued here. The 
H has no significance other than to distinguish the statistic. 

2 Neyman, J., and E. S. Pearson, “On the Problem of Two Samples/’ 

Bulletin international de L’Acad&mie polonaise des sciences et des lettres , 
Classe des sciences mathSmatiques et naiurelles . S&rie A: Sciences matM- 
matiques (1930), pp. 73-96. * 
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N iN 2 (v 7 v 

+ wi+NLy (Xl " X2) 


(u) 


value of where X is the mean of the combined 

Ni + A' 2 , , 

samples. If only the value of the sample means and standard 
deviations are known, <r 0 may be computed from the relationship 

, AVf + AVI , NiN, 

<r » = ~ft+ Ni + 0i + Ni) 

Table 48 shows the lower 1 .05 and .01 points of the sampling 
distribution of A„ for various values of N i and ATj. The use 
of this distribution in testing a joint hypothesis will now be 
illustrated. 

Table 48.—Approximate Values of the Lower .05 (in Boldface Type) 
and .01 Points of the Sampling Distribution of Xh for Selected 

Values op N i and W 


Approximate values of \h 


v 2 






Probability 


5 

10 

20 

50 

00 

points 

Ah 






P(X» S) 

5 

.0167 

.0222 

.0241 

.0247 

.0248 

.05 


.0019 

. 0029 

.0033 

.0034 

.0034 

.01 

10 

.0222 

.0312 

.0349 

.0364 

.0368 

.05 


.0029 

.0048 

i 

.0058 

.0061 

.0062 

.01 

20 

.0241 

.0349 

.0401 

.0425 

.0432 

.05 


.0033 

.0058 

. 0071 

.0078 

0080 

.01 

50 

.0247 

.0364 

.0425 

.0459 

.0473 

.05 


. 0034 

.0061 

.0078 

.0088 

.0092 

01 

CO 

.0248 

.0368 

.0432 

.0473 

.0500 

.05 


.0034 

.0062 

.0080 

.0092 

.0100 

.01 


. Adapted by permit™ Com J. Neyman and E. S Pearson, On the Problem of 
Two Samples," Bulletin international de VAcadlmie polona.se des sconces et ies Uthes 
Class,, de, sciences matUmatiques et naturellcs. 84rU 4: Scene., mathemat^aes (1930), 
p. 92, Tables II and III. 

An Example. Consider once again the employment data of 
small industrial companies in 1929 and 1935. The two samples 
of 10 already analyzed for differences in means and variances 

i The greater the differences, the smaller the values of X. 
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had means of 89.1 and 123.5 and variances of 9,834.5 (= 99.16 2 ) 
and 18,832.1 (== 137.2 2 ). The question is: In view of these 
sample results could these twn samples have reasonably come 
from the same population? To answer this question the null 
hypothesis is set up that the populations are the same, and the 
value of \ H is calculated. For the given data, the value of <rg is 


(10) (9,834.5) + (10) (18,832 .1) , (10) (10) 

"" 10 + 10 h (10 + 10) 2 


(89.1 - 123.5) 2 


which gives <r 0 = 120.9. Hence, 


14,629.1 


log \ H = (10)(log 99.16 - log 120.9) + (10)(log 137.2 

- log 120.9) = -.31170 = 9.68830 - 10 

and 


A# = .4879 


Table 48 indicates that, the greater the difference between the 
means and variances, the smaller the value of X H ; that is, small 
values of \ H are significant. Since the foregoing value of is 
larger than either the .05 or the .01 value for JVi = 10 and 
Nt = 10, the null hypothesis must be accepted. In other words, 
there is little reason in this case to believe that the two samples 
are not from identical populations. 

. In conclusion, it should be noted that, if this test should show a 
significant difference between the two samples, there is nothing 
in the result itself that will tell whether the difference lies in the 
means or in the variances, or in both. In fact, the result is 
purely a joint product; for the significance of the difference 
between the means will depend to some extent on the amount of 
difference in the variances, and vice versa. In the present 
instance, the individual tests showed no significant difference 
between either the means or the variances, and it could not 
therefore be expected that the two samples as a whole would 
be deemed significantly different. In some cases, however, one 
of the individual tests might show a significant difference, while 
the joint test would fail to show any difference. As suggested 
above, the conclusions to be drawn from the various tests 
depend entirely upon the problem. Each type of problem has 
its own appropriate test and should not be confused with tests 
appropriate for other types of problem, 
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DIFFERENCE BETWEEN TWO INDEPENDENTLY DERIVED 
CORRELATION COEFFICIENTS 

The difference between two sample correlation coefficients can 
readily be tested by making use of the z transformation. 1 A 
sample z, corresponding to a sample r, it will be recalled, has a 
sampling distribution that is approximately normal m form, 

1 


with a variance equal to 


Accordingly, the difference 


N - 3 

between two sample z’s is also practically normal in form with a 
variance equal to 2 


= _i_ + 

AT O 1 


1 


N~— 3 A r/ — 3 


( 12 ) 


Thus to test whether 212 is significantly different from z[ 2 it is 

merely necessary to note whether 3 


Z12 






1 


+ 


1 


= ^ 1.96 


N - 3 N' - 3 

The problem is the same as the difference between two sample 
means when the variances are known. Thus, if 1 T 2 = .7658 anH 


12 


= .7398, and hence z i2 = 1.01 and z( 2 = .95; and if N = 40 


and N f = 60, the quantity 


. 1.01 -.95 




would be equal to 


*\ZW7 + Vt 


a/w - 3 _r N' - 3 
or 2.86. Since this is greater than 1.96, it may be concluded 
that the two correlation coefficients and r[ 2 & re significantly 
different. 


DIFFERENCE BETWEEN TWO INDEPENDENTLY DERIVED 
REGRESSION PARAMETERS 

The difference between two independently derived regression 
parameters can be tested in the same way as the difference 

1 See Chap. XII. 

2 If two variables are independent, the distribution of the difference 
between two normally distributed variables is normally distributed and has 
a variance that is the sum of the variances of the two variables (c/. Appendix 
to this chapter, pp. 419-421), 

s This assumes a region of rejection equally distributed at each end. 
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between two means. First, the sample higher-order variances 
are pooled to give an estimate of the population higher-order 
variance. Thus 

1 ‘ 28 ■ •' N' +N" - 2k ( 13 ) 

Then the standard error of the difference between the two 
regression parameters, say, between and b",, would 

be given by 

<U>'l2.3 . . . -6"lS.J . . . “ . . . + <?bl"2.8 . , . (14) 




The test of the difference could then be made by comparing 
the difference between the b’ s with the standard error of the 
difference, using the t distribution if the sample is small or the 
normal curve if the sample is large. 

For example, suppose that, in a given problem, N r = 50, 
,^i2.3 = 2.7, (V 1>23 ) 2 = 25, and (<7 2>3 ) 2 = 43, while N" = 70, 
Mi. s = 3.1, ((7 , 1 , . 23 ) 2 = 36, and « 3 ) 2 = 48. Then 

<t 2 = ( 5 Q)(25) + (70) (36) _ 

1,23 50 + 70 - 6 ~ 33,07 

since k = 3, which is the number of regression statistics in the 
regression equation. Also, 

_ 33.07 _ m tz a 

^b',„ o /to\ .0154 


Hence, 


(43) (70) ” 

,=s- 

= .0154 + .0098 = .0252 


= \/-0252 = .16 

The difference between &' 12 . 3 and &i' M is -.4, and the ratio of this 
difference to its <x is — .4/. 16 = —2.5. Since the samples are large, 
the sampling distribution of fe' 12 . 3 - 6" 3 may be assumed to be 
normal. Hence a deviate of -2.5 lies beyond both the .05 point 
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and the .025 point—i.e., beyond -1.645 and beyond -1.96—and 
the hypothesis that &' 12 . 3 and &J' 2 . 8 came from the same population 
is to be rejected. That is, b' 12 .z must be deemed significantly 
different” from 5" 2>3 . 


APPENDIX 

THE MEAN AND VARIANCE OF THE SUM OR DIFFERENCE OF 
TWO VARIABLES 

I. The mean. Let Z be equal to the sum (or difference) of the 
variables X and Y so that Z iS = ± 7/. Let X take on the 

values Xi, X 2 , and X 8 and Y the values Y i, Y 2 , and F 3 ; and let 
the joint relative frequencies, or probabilities, of pairs of A and 
Y be as follows: 


y 3 

~~y 2 

’ r i ’ 

Pn 

pn 

Pn 

pn 

Pn 

P 21 

Pzz 

Pn 

1 Pn 


Xi 

: x 2 

j x 3 


Thus pi 2 means the probability of an XiY 2 combination, pn 
the probability of an X 8 Y i combination, etc. In the interests 
of simplicity, only three values are taken for each variable. The 
argument is equally valid, however, for any number of values 
for each variable and for continuous as well as for discrete 

distributions. ■■ 

Since Za = Xi ± Yj, the various values of Z and their prob¬ 
abilities are as follows: 



P(Z) 

z 

P(Z) 

z 

P(Z) 

Xi 

Xi 

Xi 

± Yi 
± t 2 

± Y 3 

pn , 

pn 

pn 

X 2 ± Yi 
Lib 
X 2 ± Yz 

23 21 

23 2 2 

23 2 3 

. 

X 3 ± Y i 

X , ± Y , 
X, ± Y s 

Pzi 

PZ2 

Pzz 


This is the distribution of Z. 

The individual distributions of X and Y are as follows: 


X 

P(.X) 

Y 


Xi 

II 

+ 

+ 

OS 

Yi 

f 

Pi 

f 

X* 

232 = 2321 + 2322 + 2323 

Y 2 

Pj 

Xz 

Pz = 2331 + 2332 + 2333 

7 3 

Pz 


P(Y) 


= Pn + P 21 + Pn 
= p 12 + 2322+ Pn 
— pn + Pn + Pn 
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The problem of this section is to express the mean of Z in 
terms of the means of X and Y. 

By definition, 2 = 2P(Z)Z, and when written out in full this 
becomes 

pn(Xx ± TO + p 2 i(X 2 ± Ti) + p S i(X 3 + Fi) 

,+ pi2(Xi ± F 2 ) + p 22 (X 2 ± F 2 ) + p 32 (X s ± F 2 ) 

+ Pi 3 (Xi ± F 3 ) + p 23 (X 2 ± F 3 ) + p 33 (X 3 + F 3 ) 

Upon removal of parentheses and collection of common terms, the 
mean of Z is seen to be equal to ’ 

[(Pn + Pu + Pu)X 3 + (p 21 + p 22 + p23 )X 2 

+ (Pai + Pn + p 33 )X 3 ] ± [(p„ + p 21 p 31 ) Fj 

Hence + (P 12 4- p 22 + p 32 ) Y 2 4- (p n + p 23 4- p 33 ) F s ] 

2 = (piXijf P 2 X 2 4 - p 3 X s ) + (piFi + p'F*4- p'F 3 ) 
c X i Y 

That is, the mean of the sum or difference of two variables is the sum 
or difference of their means . (This is true whether the variables 
are independent or correlated.) 

II. Variance. Let X and Y both be measured from their 
mean values, and let z be the sum (or difference) of X and Y when 
so measured. Hence ^ = a* ± y h From I, it follows that 
Z = x± y; and since x = y = 0, 2 also equals 0. Thus the 
variances of the variables become 


<*I = Spiz; 




S P«4 


The problem is to express df in terms of d 2 and d£. 

When written out in full, d 2 is as follows: 

Vn(xx + y i) 2 4 - P2i(x2 + 2/j) 2 + pn(x 3 ± y j) 2 

+ Pu(*i ± y 2 ) 2 4 - P22(X2 ± 2/ 2 ) 2 + p 32 (x 3 ± y 3 y 

+ Pufa ± y 3 ) 2 4 - p 23 (^ 2 + 2/3)2 + p 33 (x 3 ± yz y 

Upon clearing parentheses and collecting common terms this 
becomes 7 

(Pll + P12 +P\ 3 )x\ + (p 21 4 - p 22 4- p w )xi + (p M + p 32 + Vz ^ x 2 
+ (Pu + P 21 4 - Pn)y\ 4 - (P12 + p22 + p 32 )vl 
+ (Pn + P2 3 + p 33 )y\ ± 2 ( Pll x lVl + V 2 l X2 Vl 4 - p 3 ix 3Vl 4 - p nXl y 2 
+ P22X2V2 + p 3 2X 3 y 2 + pi 3 Xiy 3 + p2 3 x 2 y 3 4 - p 33 x 3 y 3 ) 
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which reduces to 

Pix{ -f p*xi + p 8 x | + p\y\ + pivl + p'aVl ± 2Xp ij x i y i 
Accordingly, 

d’ = 6i + d» + 2r«M, since r = Mi 

(TxCy 

That is, the variance of a sum of two variables equals the sum of 
their variances plus twice the correlation coefficient times the product 
of the two standard deviations. Also, the variance of a difference of 
two variables equals the sum of their variances minus twice the 
correlation coefficient times the product of the two standard deviations . 
Finally, if the two variables are independent , r = 0 and the variance 
of their sum or difference equals the sum of their variances. 





CHAPTER XVII 

ANALYSIS OF VARIANCE 


In recent years, much use has been made of a so-called “analy¬ 
sis of variance.” 1 This has been particularly true in the field of 
biological and agricultural experiments. The method has such 
wide application, however, that there is hardly any field of 
statistical investigation in which it cannot be employed. 

Analysis of variance is essentially a method of testing for the 
existence of correlation or association. The technique consists 
in classification and cross classification on either a qualitative or 
a quantitative basis and in comparison of the variation from class 
to class with the variation within classes. The analysis is so 
arranged that variation within classes can be presumably attri¬ 
buted to chance. The test of association or correlation consists 
in comparing the variation between classes with the supposed 
chance variation within classes. The problems discussed below 
will illustrate the details of this procedure. 

PROBLEMS INVOLVING A SINGLE BASIS OF CLASSIFICATION 

Nature of Problems. In the simplest cases of analysis of 
variance there is only one basis of classification. This will 
accordingly be the first type of problem to be discussed. 

In Table 49, on page 426 are listed the grades of 15 representa¬ 
tive students 2 in an elementary course in economics. These are 

1 Historically, the method can be traced at least to the 1910 edition of 
G. Udney Yule’s Introduction to the Theory of Statistics, in which Chap. V on 
Manifold Classification introduces the use of contingency tables that are 
the ancestors of modem analysis of variance tables. A certain similarity 
also is revealed between analysis of variance and Lexis’s analysis of sub¬ 
normal and supernormal dispersion. Of. Rietz, L. Mathematical Statistics, 
pp. 146-155. For the original work see W. Lexis, “tlber die Theorie der 
Stabilitat statistischer Reihen,” Jahrbucher fur Nationalokonomie und 
Statistik, Vol. 32 (1879), pp. 60-98; and W. Lexis, Abhandlungen zur Theorie 
der Bevolkerungs- und Moralstatistik, (1903), Chaps. V-IX. 

2 Actually, each grade is the average of the grades of several students, 
but for the present analysis it will be considered a single individual grade. 
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Ossified according to the teacher of the student, and the problem 
is to determine whether the difference m teaching has an effct. 
on the grades of the students. If the difference m teaching has 
„ n effect it will be reflected in variation m the mean grades of - , 
indents’ as between teachers. The problem is therefore to 
determine whether the variation in mean student- gra< es from 
teacher to teacher is greater than can reasonably be attributed 

° h <Theoretical Basis for the Analysis. It may be assumed that 
g™d« Of students 

is due to the chance difference m ability of the students. 1 his 
would be the case if the students were assigned to the teache 
kt random If this variation of grades within each group 1 ,. 
"ooW f r.H groups, . good estimate will be «,ur=d rfW. 
much variation can be expected purely as a lesult of chance. 
This will form a standard with which to compare the var 
to the mean student grade from teacher to teacher. Of couise 
the latter cannot be expected to vary as much as individual 
tides stoce they are mean values.* Allowance for the smaller 
variation among means can be made, however, by proper weight- 
tog of the variation in the mean grades before the comparison 

18 Thftheoretical basis for the precise method of comparison 
Jtls used may be outlined as follows The ^ 
analysis is to set up the null hypothesis that the difference m 
teaching has no effect on the grades. Under these conditions 
both the variation in the means of student grades from teacher 
to teacher and the variation of grades within the groups of stu¬ 
dents having the same teacher will stem hack to am 
general in student grades, or to what may be called the popula 
tion variance.” If this is large, variation m the mean student 
grades from teacher to teacher will tend to be large as wi a so 
variation around these means, If, on the other hand, e popu 
lation variance is small, the variation in mean student grades 
from teacher to teacher will tend to be small and so will variation 

^Jcordingl Jfthe null hypothesis is correct, either the sample 
variation in the mean student grades from teacherUc^teachei 
or the variation around these mean grades could be used to esti 


i T+- that the standard error of 
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mate the underlying population variance. Of course the two 
estimates could not be expected to be the saipe.i In one sample 

particular jfc, a, nlll , ^cthc* S„M be Lw*bv 

wheEofnoUhtm ° f ll ' * w ° <* variance and seeing 

whether or not the sample ratio deviated unreasonably from unity 

Tcf pot ; rz ■ ‘L “• ° f “-w- «< 

r~ zirr r 

form of the .ampling distribution t|* lr „t,„ “ d * h * 

hood «'m"Z* H IT*"“ “"1 *“ U " d “ ■“”">«»> iikeli- 

”.*« i fo,f“nhe h “ r “ipro"i^!e',“* me 

r » p r 

stimate of variance is the population variance. For this reason 

*un="Sat d “““ te - " - «>« ." 

Mathematical analysis shows that the maximum-likelihood 
estimate of the population variance that is based upon the varia- 

tion m the means is given by ~ x ) 2 , • u 

J r ~ l > tJlat ls ? by the 

weighted sum of the squares of the deviations of the individual 
eans about the mean of the entire sample of grades divided by 
the number of means minus 1. The N r is the number of student 
grades used to calculate the rth mean* r is flip nn k r 

c,lcul.W and in ,hi« «. wou ,d ^ ^nX .i 

•I" XT’ “ “i X : T 1 '»' - 1 **«.»' 

, I - 1 d not r Agrees of freedom because the 
sample estimate is calculated from the variation around the 
grand mean of the entire sample and not the mean of the popula 

ru; xrrxLfr” - a «-*• 
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A second maximum-likelihood estimate of the population 
variance is that based upon the variation around the means, 

■ l^(x ir -x r y 

which is given by r N — -that is, by the pooled sum 

of the squared deviations from the individual means divided by 
the number of cases minus the number of means. This is an 
estimate of the population variance based on IV — r degrees of 

freedom. . , 

These are the two estimates of the population variance that 

are used in the analysis of variance. The next question is: What 
is the sampling distribution of their ratio? The answer given by 
imathematical analysis is as follows: If the original population is 
inormal and if the null hypothesis is correct, the ratio of the two 
maximum-likelihood estimates of the population variance will 
tend to fluctuate from sample to sample in accordance with the 
F distribution. If the estimate based upon the variation in the 
means is put in the numerator of the ratio and the estimate based 
upon the variation around the means is put in the denominator, 
the appropriate F curve is that for which n.y = r — 1 and 

n 2 = jy - r. - 

The Numerical Analysis. Before proceeding to the application 
of the foregoing theory to a concrete problem, certain mathe¬ 
matical relationships should be noted that will be helpful in 
carrying out the numerical calculations. A study of the theo¬ 
retical formulas shows that the numerator of each of the estimates 
of variance is a sum of squares. These could be calculated 
directly, of course, but it is usually easier to make use of the 
following identity, 

v(z,-i)^x N r (x r -xy + xX (w-w 2 e> 

which says that the total variation in the data, as represented 
by the total sum of squares, may be broken up into two parts, 
one consisting of the variation in the means (as represented by the 
weighted sum of the squared deviations of the individual means 
from the grand mean) and the other consisting of the variation 
around the means (as represented by the pooled sum of the 
squared deviations of the individual items from the mean of 
each group). 
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Since it is usually easier to calculate the total sums of square* 
and that pertaining to the means than it is to compute the sum 
of squares of the residual deviations, it is commonly the practice 
o calculate the latter by taking the difference between the other 
two more easily calculated values. 

In calculating the sums of squares the following special for 
mulas are found useful. Thus the total sum of squires can 
computed from the equation' q be 


2 (x,. - X)* = J xi - N x* = 2 


X 2 - — (' 
N 


( 2 ) 


and the weighted sums of squares of the deviations of the indi- 

I^ 0 “ S Srand mean Can bC com P uted fro® the 


V. N r (x r ~xy = V m 


(3) 


!n tleliS^ ref6rS t0 ^ SqUare ° f ^ SUm ° f the 

Table 49.^Geades of Representative Students* Classified 

-&V Tm * __ 


Teachers 

Grades of students 

Mean grade 

I 

II 

III 

IV 

V 

1 83.25 

88.75 
76.25 

78.75 
81.50 

77.50 

74.75 
67.25 

68.75 

75.75 

71.00 

70.00 

69.25 

62.25 
64.75 

77.25 

77.83 

70.92 

69.92 
74.00 


The application of these equations to the data of Table 49 is 
carried out m the calculations of Table 50. This latter table 

shows that 2X1 = 82,850.1875 and = 8 2, 103 .004 2 . 

Hence the total sum of squares 2(X; — X) 2 is mini t,-, 

82,850.1875 - 82,103.0042 = 747.1833. Q t0 

1 S(Xi - - 2SXiX + NX* = 2X° - NX*, since 

NX = xXi. 

2 The equation follows from the fact that Y - (v v i e i 

the total for all rows is 2N,X r = “1 “ ( ;) ’ f ° r Bach r ° W ’ and 
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Table 50 also shows that 


[SXJli (SX ;) 2 


= 82,257.3125 - 82,103.0042 - 154.3083, 


which is the sum of the weighted squared deviations of the 
individual means from the grand mean. The difference between 

Table 50.— Worksheet for Calculating the Various Sums of Squares 
for Analysis of Variance 


Squares of 


Partial sums 


109.75 


231.75 


212.75 


209.75 


222.00 


Individual grades 


6.930.5625 
6,006.2500 
5,041.0000 

7.876.5625 

5.587.5625 
4,900.0000 

5,814.0625 

4.522.5625 

4.795.5625 

6.201.5625 
4,726.5675 
3,875.0625 

6,642.2500 
5,738.0625 

4.192.5625 

82,850 1875 


53,708.0625 


54,522.2500 


45,262.5625 


43,995.0625 


49,284.0000 

246,771.9375 
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£ 5) (X ir - Xr)\ is thus equal to 747.1833 - 154.3083 

r i 

= 592.875. 

To make the appropriate estimates of the population variance 
the last two sums of squares are each divided by the proper 
degrees of freedom. Thus 154.3083 is divided by 5 — 1 = 4 to 
give 38.5771 as the estimate of the population variance based 
on the variation among teachers in mean student grades. Simi¬ 
larly, 592.8750 is divided by 15 — 5 = 10 to give 59.2875 as the 
estimate of variance based on the variation around the means. 
The ratio of these two estimates is 38.5771/59.2875 = .65. 

To determine whether this sample ratio justifies the rejection 
or acceptance of the null hypothesis requires the selection of a 
suitable region of rejection. On the assumption that the risk 
of rejecting the null hypothesis when it is true will be put at the 
usual figures of 1 in 20, the size of the region of rejection will be 
.05. This is purely arbitrary, however, and might be set at 
.01 or .10 or any other figure, depending on the coefficient of risk 
adopted. A more significant question relates to the distribution 
of the region. Since the difference in teaching, if it had any 
effect, would tend to be revealed in a larger variation among the 
teachers in mean student grades, values of the ratio that are 
smaller than 1 would seem to be of little significance. The risk 
that is to be minimized by the proper choice of the region of 
rejection is the risk of accepting the null hypothesis when in fact 
the variation among teachers in the mean student grade is larger 
than may be due to chance. It would seem, therefore, that the 
proper region of rejection in this instance would be the upper 
.05 tail of the F distribution. This is the region that will be 
adopted in this and in all subsequent analyses of variance. 

For 7ii = 4 and n 2 = 10, the upper .05 point of the F distribu¬ 
tion is 3.478. The sample ratio in the present problem is .65 
and obviously does not fall in the region of rejection. 1 Therefore 
the null hypothesis is accepted, and the variation in mean 
student grades from teacher to teacher is to be attributed to 
chance and not to any difference in teaching. 

1 If the ratio is less than unity, it will never fall in the region of rejection. 
In such cases, calculation of the two estimates of variafice will of itself give 
sufficient evidence for acceptance of the null hypothesis. 
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PROBLEMS INVOLVING MORE THAN ONE BASIS 
OF CLASSIFICATION 

A Single Case in Each Class. In the previous problem it 
might have been suspected that, if allowance were made for the 
general standing of the students, the difference in teaching might 
be revealed. Suppose, as is actually the case, that the first 
student of each group is a high-standing student, the second Is 
one of average standing, and the third one of low standing. Then 
the 15 grades may be classified according to two criteria, standing 
and teacher, and mean grades may be calculated for the divisions 
of each classification. This is done in Table 51, on page 432. 

Under such circumstances, two questions may be asked: (1) 
If allowance is made for the difference in standing, has the differ¬ 
ence in teaching any significant effect on grades in the given 
course? (2) If allowance is made for the difference in teaching, 
has the difference in standing any significant effect on grades in 
the given course? The former question is the one that primarily 
concerns us here, but the second may also be of interest. 

Theoretical Basis for the Analysis. In this more extended 
problem there are three types of variation to be considered. 
First, there is the variation in the means of the rows, the mean 
student grades for different teachers; second, the variation in the 
means of the columns, the mean grades of students of different 
standings; third, the variation in the individual grades about 
what would be expected from the combined row (teaching) and 
columh'(standing) effects. 

This third variation needs further explanation. If a student’s 
grade differed from the grand mean of all the students’ grades 
by just the same amount that the average grades of all students 
having the same teacher diffeied from the grand mean plus the 
amount that the average grades of all students haying the same 
standing differed from the grand mean, the grade of this student 
would be equal to X + (X r — X) + (X c — X). In nearly 
every case a student’s grade does not come exactly to the sum of 
this combined row and column effect but differs from it. This 
difference will be given by 1 

X n - X - (X r - X) - (X c - X) = X rc - X r - X c + X 

1 Individual grades are now designated by X rc instead of X z since any 
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It is the variation in such differences that constitutes the third 
type of variation distinguished above. 

Whereas the variation in the row means might be due to the 
difference in teaching and the variation in the column means 
might be due to the difference in standing, this last variation is 
presumably due to chance.* For brevity it will be called the 
‘ ‘ remainder ” variation. 

If the null hypothesis is set up that the difference in teaching 
has no effect on the grades, the size of both the variation in the 
row means and the size of the remainder variation will depend 
on the size of the variation in students’ grades in general— i.e., on 
the size of the population variance. As in the previous problem, 
each of these two kinds of variation could be used to make an 
independent estimate of the population variance, and their ratio 
could be used to test the null hypothesis. The analysis is essen¬ 
tially the same as in the previous instance except that now, 
according to the hypothesis, the measure of the chance variation 
does not contain any possible effect of the difference in standing. 
This test of teaching is independent of any possible effect of 
standing. 

In the same manner, the null hypothesis that the difference in 
standing does not have any effect on the students’ grades can be 
tested by comparing an estimate of the population variance 
based on the variation in the column (standing) means with an 
estimate based on the remainder (chance) variation. This is a 
test of standing that is independent of the possible effect of 
teaching. 

In making the foregoing tests, the maximum-likelihood esti¬ 
mates of the population variance that are used are as follows. 1 
The estimate of the population variance based upon the variation 
SA (X — X) 2 

in the row means is- r X v ^ - L . The estimate based upon 

the variation in the column means is ■ — - c — 0 ■ ^ , and the 

c — 1 


individual student has both a teacher and a standing and there is only one 
student for any given combination of the two. 

1 That these are the proper formulas can be demonstrated by strict 
mathematical analysis similar to that of Chap. X. For further discussion, 
see J. 0. Irwin, “Mathematical Theorems Involved in the Analysis of 
Variance,” Journal of the Royal Statistical Society , Vol. 94 (1931), p. 284. 
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X 2 + 1)2 
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If the null hypothesis that teaching has no effect is correct, 
nd if the population is normal, the ratio of the first estimate o 
he last will have a sampling distribution that is of the form of 
he F distribution with m = r - 1 and n 2 - (r - l)(c JJ. 
Akewise, if the null hypothesis that standing has n<r effect is 
nrrect and if the population is normal, the ratio of the second 
stimate to the last will have a sampling distribution of the foim 
,f the F-distribution with «i = c - 1 and - (r l)(c )■ 

It is upon these theoretical conclusions that the following 

numerical analysis rests. , . , 

The Numerical Analysis . In putting the foregoing theory m o 

yractice, use is made of the identity 

£ X ( x " - xy = X Nr & - xy + X ~ ^ )2 

r e . r + X X ^ (4) 

, r c 

vhich merely says that the total sum of squares is equal to the 
um of the sum of squares for each of the three variations thq 
variation in the means of the rows, the variation in the means of 
he columns, and the remainder variation. The total sum of 
squares and the sum of squares for the means of the rows have 
dready been computed. If the sum of squares for the means 
if the columns is also computed, the remainder sum of squares 
•an be calculated from the foregoing identity. ‘ 

The equation for the calculation of the sum of squares for the 
neans of the columns is similar to that for the means of the rows 
Hid takes the form 




[?*-]'. (W 


N c 


N 


(5) 


ri which [X refers to the squared sum of the r grades in 
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the cth column. For the given data, this is equal to 
82,621.1625 - 82,103.0047 = 518.1583 
From Tables 51 and 52, it is seen that 



413,105.8125 

5 


82,621.1625 


From Table 50 it was found that 


(zx„y 

N 


82,103.0042 


Hence, by Eq. (5), 


SA' f (X c xy - 82,621.1625 - 82,103.0042 = 518.1583 


Table 51.— Geades of Students Classified 


Teacher 


Standing 



High 

I 

83.25 

11 

88.75 

III 

76.25 

IV 

78.75 

V 

81.50 

Mean grade 1 

81.70 


Medium 


77.50 

74.75 
67.25 

68.75 

75.75 
72.80 


by Teacher and Standing 


Mean grade 

Low 


71.00 

77.25 

70.00 

77.83 

69.25 

70.92 

62.25 

69.92 

64.75 

74.00 

67.45 

73.98 


Table 52 . 


-Calculation of Sum of Squabes fob Means of Columns 


High standing 
Medium standing 
Low standing. 


Sums of Squares 

columns of sums 


408.50 

364.00 

337.25 


166,872.2500 

132,496.0000 

113,737.5625 


413,105.8125 

- 

c L r Jc 


The remainder variation, ^ (X, 

r c 


X r — X c -f- X) 2 , is cal- 
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dilated from the identity (4) and is found to be equal to 1 

747.1833 - 518.1583 - 154.3083 - 74.7167 

The foregoing data may be assembled in an “analysis of 
Variance” table (Table 53), in which the various estimates of 
l 2 can be calculated. 


Table 53—Analysis of Variance 



Sum of 
squares 

Degrees of freedom 

U nbiased 
estimate of 


154.3083 

518.1583 

74.7167 

r - 1 = 4 
c — 1 — 2 

(r — 1) (c — 1) = 8 

38.5771 
259.0792 

9.3396 

LV±t5CtllO V/-L wo. 

lVTonrus nf flolllITLIlS . 

Remainder.... 

Total . 

747.1833 

V - 1 = 14 




To test the null hypothesis that the variation in the means of 
the columns is due to chance, i.e., that the difference in stand¬ 
ing has no effect on the grades in the given course, the ratio 
259.0779/9.3507 is calculated. The result is 27.707. For 
ni = 2 and n 2 = 8, the .05 value for the F distribution is 4.459. 
The sample ratio 27.707 is much greater than 4.459 and thus 
clearly lies in the region of rejection. The null hypothesis can¬ 
not be accepted, and it is to be concluded that the difference in 
standing does have definite bearing on the grades obtained in 

the given course. ,, 

To test the null hypothesis that the variation in the means ot 
the rows is due to chance, i.e., that the difference in teaching has 
no effect on the grades, the ratio 38.5765/9.3507 is calculated. 
The result is 4.126. For ru = 4 and n 2 = 8, the .05 point of the 
F distribution is 3.838. The proximity of the sample value 
4 126 to the .05 point, 3.838, suggests that this problem offers a 
borderline case. Nevertheless, it does lie in the region of rejec¬ 
tion, since the sample value is the larger; and if the rule of pio- 
cedure is strictly followed the second null hypothesis cannot be 
accepted. It may be concluded in this case that the difference 
in mean teacher grades is to be attributed to the difference m 
teaching. 

1 The total sum of squares is 747.1833 and was calculated on p. 420. The 
sum of squares for the row means is 154.3083 and was calculated on p. 427. 
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It will be recalled that in the previous case the hypothesis tha* 
the difference in teaching had no effect on the grades was no' 
rejected. Here it is rejected. The reason for the different con 
elusion lies in the different measure of chance variation. In th< 
former problem all the variation over and above the variatioi 
in the mean student grades from teacher to teacher was takei 
to be chance variation. This represented chance variation ii 
student ability in the course, due to any cause whatever. Coni' 
pared with such chance variation, the variation in mean studen 
giades from teacher to teacher was not deemed significant 
That is, the variation in mean student grades from teacher tc 
teachej might easily have been explained by the chance different 
in ability of the students assigned to each teacher. 1 

In the second problem, in which not only different teachers bu 
differences in standing are involved, the measure of chanc< 
variation is taken to be the variation remaining after the varia 
tion due to difference in teaching and the variation due t( 
difference in standing have both been eliminated. That is, it i; 
the chance variation in student ability within the groups of high 
average, and low standing. It is the chance variation remaining 
after student standing has been roughly accounted for. Com¬ 
pared with such a chance variation the difference in teaching doe; 
appear to have some effect on the students’ grades. 

In conclusion, it should be pointed out that in the presen 
problem the variation of the means of the row T s is independent o 
whether the variation in the means of the columns is due t< 
chance or not. The test is valid in any case. If it should appea: 
from a previous test that variation in the means of the column; 
is due to chance, then it is better to combine this variation witl 
the remainder variation so as to get an estimate of chance varia 
tion based upon a greater number of degrees of freedom. For i 
is better to use the test pertinent to a single basis of classificatioi 
than to use the present test, if the variation in the means of th< 
columns is due to chance, because the estimate based on a large] 
number of degrees of freedom gives a more reliable estimate o 

Strictly speaking, the analysis in that problem was not valid owing t< 
the way m which the students grades were selected. The students were no- 
assigned to the teachers entirely at random, but in order to serve for furthe.. 
analysis the students were so chosen that each teacher had an equal numbe’ 
ot high-, average-, and low-standing students. 
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the chance variation. What has been said about variation in the 
means of the rows applies in turn to the variation in the means of 
the columns. The test is equally independent of whether the 
.variation in the means of the rows is due to chance or not. If 
the latter is due to chance, however, the test of the former is 
best based upon a chance variation that includes the variation 
in the means of the rows. 

More than One Case in Each Class. The Problem. In the 
preceding problems it was assumed that there was only one 
student of each standing assigned to each teacher. As a matter 
of fact, four students of each standing were assigned to each 
teacher, and the grades of the previous problem were actually 
the mean grades of these groups of four students, rather than 
grades of individual students. The full data are given in Table 
54, on page 437. 

The problem now to be considered is the analysis of variance 
of this full set of data shown in Table 54. Three questions may 
be asked; (1) Is the variation in the means of the rows greater 
than may be reasonably attributed to chance? (2) Is the varia¬ 
tion in the column means greater than may reasonably be attri¬ 
buted to chance? (3) Are the individual cell means what may 
be reasonably expected on the assumption that each individual 
item is a mere sum of a row effect plus a column effect, or do these 
cell means give evidence of an interaction between row and cel} 
effects? In other words, the three questions are: (1) Has the 
difference in teaching an effect on grades? (2) Does difference 
in general ability affect grades? (3) Is there any interaction 
between teacher and student ability— i.e., does one teacher do 
better with high- (or low-) standing students than another? 

Theoretical Basis for the Analysis. In this problem there are 
four types of variation to be considered. There are the variation 
in the row means, the variation in the column means, the varia¬ 
tion in the means of each cell about what may be expected from 
a linear combination of row and column effects (the so-called 
“interaction”), and, finally, the variation in the individual items 
about the means of each cell. 

The variation in the individual items about the means of the 
cells is presumably due to chance and may be spoken of as the 
“chance” or “remainder” variation. It will be noted that this 
is not the same as the remainder variation of the preceding 
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problem' The remainder variation in the preceding problem 
is now the u interaction.” Previously, the remainder variation 
was presumed to represent chance, for there was no other basis 
for measuring chance variation. Here there is another basis for 
measuring chance variation, and the former remainder variation 
may now be tested for the possible existence of correlation or 
interaction. 

In the present problem three hypotheses may be tested. First 
there is the null hypothesis that the difference in teaching has no 
effect on grades) second, the null hypothesis that the difference 
in standing has no effect on grades; third, the null hypothesis 
that there is no real interaction between teaching and standing. 

To test the first hypothesis an estimate is made of the popula¬ 
tion variance based upon the variation in the row means, and 
this is compared with an estimate based upon the remainder or 
chance variation. To test the second hypothesis an estimate 
of the population variance based upon the variation in the column 
means is compared with the estimate based upon chance varia¬ 
tion. To test the third hypothesis an estimate based upon the 
interaction is compared with the chance estimate. The estimates 
based upon the different sources of variation are the following; 
each is,a maximum-likelihood, or unbiased, estimate: 

Estimate based on the variation in the means of rows: 

2Nr(X r ~ X) 2 
r — 1 

Estimate based upon the variation in the means of the columns: 

2N C (X C - xy 

Estimate based upon the interaction: 

x x N rc (x rc - x r - X c - xy 

(r - l)(c - 1) 

Estimate based upon the remainder or chance variation: 

X 2 X (*< - 

r c i _ 

N — rc 
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Mathematical analysis shows that, if the first null ^Pothesis 
is correct and if the population is normal, the ratio of the fi _ 
estimate to the last has a sampling distribution of^the form ^ 
the F distribution with m = r — 1 and n 2 N • 
second null hypothesis is correct and the population is norma , 
he ratio of Ito second estimate to the last has a sampling distri¬ 
bution of the form of the F distribution with ni = c - 1 and 

Table 54 -Gbades or Students in a Given College Coubse Classified 
According to Theib Genebal Standing and Teachebs_ 


Teacher 


II 


III 


IV 


Means of col¬ 
umns 


High 1 

standing 

Tvledium 

standing 

Low 

standing 

90 

76 

85 

82 

Average 83.25 

87 

77 

75 

71 

77.50 

80 

77 

68 

59 

71.00 

88 

58 

72 

85 

77 

58 

90 

81- 

80 

92 

83 

70 

Average 88.75 

74.75 

70.00 

84 

62 

61 

80 

73 

77 

60 

70 

72 

81 

64 

67 

Average 76.25 

67.25 

69.25 

80 

65 

63 

76 

69 

70 

90 

77 

46 

69 

64 

70 

Average 78.75 

68.75 

62.25 

80 

1 80 

1 47 

80 

80 

62 

75 

76 

81 

91 

67 

69 

Average 81.50 

> 75.75 

64.75 

7 81.7C 

1 72.80 

~ ”~ 67.45 


Means of rows 


77.25 


77.83 


70.92 


69.92 


74.00 


Grand mean 73.98 




438 ADVANCED SAMPLING PROBLEMS 

„ = N- rc; and if the third null hypothesis is correct and the 
population is normal, the ratio of the third estimate to the last 
has a sampling distribution of the form of the F distribution with 

and n a = N — rc. The analysis thus 
pioceeds m much the same manner as in the previous problems. 

Table 55.— Calculations foe Analysis of Vaeiance 


8,100 

5,776 

7,225 

8,464 

7,056 

6,400 

8,100 

4,761 

6,400 

5,929 

5,625 

5,041 

3,844 

5,329 

4,900 

4,096 

6,400 

6,400 

4,624 

3,481 

5,184 

5,929 

Or 

00 

4,489 

2,209 

3,844 1 

6,561 


6,724 

3,600 

6,400 

3,364 

4,096 

5,776 

3,364 

3,969 

4,761 



8,100 

5,776 

7,569 

6,889 

5,929 

5,929 

3,721 

4,900 


Total = 334,735 = sjf 2 


bums of row grades and squares of sums of row 

927 

934 

851 

839 

888 

859,329 

872,356 

724,201 

703,921 

788,544 

Total 4,439 = ^ 

3,948,351 = 

r 

_ Sums ot column grades and squares of sums of column 

1,634 

2,669.956 

1,456 

-2,119,936 

1,349 

1,819,801 

Total 4,439 = ^Xi 

6,609,694 = ^ 

c 


The Numeral Analysis. In the numerical calculations use is 

again made of an identity in which the total sum of squares is 
follows: ^ mt ° ltS Van ° US COmponents - This identity is as 

X (Xi - xy s £ N r (x r - xy + £ n c (x c - xy 

r C 

+ ? ? N rc {x rc -*.-*. + *)* + 2 2 s (* - x n) > (0) 
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As in the other problems, the first four sums of squares are cal¬ 
culated directly, and the remainder sum of squares is calculated 
as a residual. These calculations are shown in Table 55. 

From Table 54 it is seen that the number of grades in each row 
is 12; each teacher has 4 students’ grades of high, low, and 
medium standing, making 12 altogether, and hence, N r = 12. 
The number of grades in each column is 20; therefore, N c = 20. 
The number of grades in the column of one row is 4; thus, 
N rc = 4. The total number of grades is 60, and N = 60.' The 
following calculations may now be made: 

The total sum of squares is best calculated from the formula 

s(x ; - xy = 2X? - (7) 

Table 55 shows that this is equal to 

2)w _ = 334 ’ 735 - 

— 334,735 - 328,412.02 = 6,322.98 

The weighted sum of the squared deviations of the means of 
the rows is calculated from the equation 


X N ^r - xy = 


[2 Xi \ _ (1 X.) 


where X;]' represents the square of the sum of the indi¬ 
vidual items in the rth row. Table 55 shows this second sum of 
squares to be 


U*.]: (jv.y 


_ 3,948,351 
12 


(4439) 

60 


= 329,029.25 - 328,422.02 = 617.23 
The weighted sum of the squared deviations of the means of 
the columns may be calculated from the equation 


X x«(x c - xr = 2 


(s *,)■ 

N,. x 


(9) 
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in which [£ represents the square of the sum of the indi¬ 
vidual items in the cth column. Table 55 shows that this third 
sum of squares is equal to 

^ [X X ']c (X Xi ) _ 6,609,694 _ (4,439) 2 
Nl N ~ 20 60 

= 330,484.70 - 328,412.02 = 2,072.68 

The fourth sum of squares is best calculated indirectly. The 
first step is to calculate the weighted sum of the squared devia¬ 
tions of the cell means from the grand mean. This may be 
obtained from the equation 

(X Xi)* 

V X N "&» - *)* - X X Nr'i.Xr.V - N - (10) 

T C T C 

Since N„ is constant in the present problem, the first term on the 
right may be written N„ X X (*")*• To calculate this term 

r c 

it is thus necessary only to square each of the cell means, sum the 
results, and multiply by N rc . This squaring and summing of the 
cell means was already done in Table 50, however, 1 where it was 
found that SX r 2 c (designated as 'ZXl in Table 50) equaled 
82,850.1875. Accordingly, as indicated in Table 55 the weighted 
sum of the squares of the cell means about the grand mean is 
equal to 

(X Xi ) 2 

X N rc (X rc y - = 4(82,850.1875) 

- 328,412.02 = 2,988.73 

The second step in the calculation of the fourth sum of squares 
is to subtract from 2,988.73 the weighted sum of the squared 
deviations of the means of the rows and the weighted sum of the 
squared deviations of the means of the columns. The result, 
according to a modification of Eq. (4), is the fourth sum of squares. 

i Because, in Table 50, the cell means of Table 54 were treated as if they 
were individual items; hence the sum of the squared items in Table 50 is 
equivalent to the sum of the squared means of Table 54, 
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Thus 

X X - X e - X r + X) 2 = 2,988.73 - 2,072.68 

r c - 617.23 = 298.82 

The final, or remainder, sum of squares is now obtained by 
subtraction of all the other component sums of squares from the 
total sum of squares. That is, 

XXX(*- X„y = 6,322.98 - 2,072.68 - 617.23 - 298.82 

r c i 

= 3,334.25 

All the foregoing sums of squares are collected m Tsble 56, 
where they are divided by the appropriate degrees of freedom to 
obtain various estimates of the population variance. Compari¬ 
sons of these estimates afford tests of the various hypotheses to 
be considered. 

Table 56. —Analysis of Variance 


Means of rows. 

Means of columns. 

Means of cells, or interaction 

Remainder. 

Total. •■•••••• 


Sums of 
squares 

Degrees of 
freedom 

617.23 

5 - 1 = 4 

2,072.70 

3 - 1 = 2 

298.82 

4 X 2 - 8 

3,334.25 

60 - 15 = 45 

6,322.98 

~60 - 1 = 59 


Estim siles 
of d 2 


154.31 

,036,35 

37.35 

74.09 


To test the null hypothesis that the variation m the row means 
is greater than can reasonably be attributed to chance-,.*., that 
the difference in teaching has no effect on grades-the ra£o of 
the first estimate to the last estimate is computed, 
found to be 2.083. The .05 point of the F distribution for m. = 4 
and n 2 = 45 lies between 2.5 and 2.69. Since the sample value 
of 2.083 is obviously less than this .05 point and hence oes no 
lie in the region of rejection, the null hypothesis is accepted m 
this instance. The fluctuations in grades from teacher to teacher 
is thus apparently due to chance, and there is no oasis or 
attributing it to a difference in teaching. This conclusion com 
tradicts the result of the previous problem, but since it . b«*e 
upon a larger set of data greater confidence is tobe placed m 
validity than in the result obtained in the preceding problem. 
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means? dll ^ that the variati ™ m the column 

means is due to chance—that the difference in general 

thematic of S th n ° effeCt / P t 0n student ^des “ the given course- 

This gives IS t0 the kSt estimate ^ computed. 

his gives 13.99, which is far beyond the .05 point o» the F dis¬ 
tribution for n l = 2 and n 2 = 45.* Hence the null hypothes s 

Tel hteT?t and * T ^ C ° nClUded ^ 

does have an effect upon the grades obtained in the given course 

bv com °T ^ 0611 means ma ^ be tested a like manner 
parmg e estimate of the population variance based 

In p 01 ^at ba «cd upon the remainder variance, 

fu? prese I nt , problem the ra tm is obviously less than 1 so that 
further analysis is unnecessary. The variation in cell means 

that student Ut f e< l t ff 0 CWe ’ there is n ° bafiis for eluding 
teache? r<3nt Standing reaCt differentI y to different 

tests of CORRELATION COEFFICIENTS i 

AS AN ANALYSIS OF VARIANCE ' 

In Chap XII certain tests were given to determine whether or 

This t? n ° Oeffi0lents were significantly different from zero 
These tests were m reahty an analysis of variance. The variance 

given^by Z ° f , regression > be recalled,* was 

g e by <r , - <r r . Similarly, the variation in the means of 
rows or columns was given by and the % 

the points on a curve of regression was given by **, = 

be tlklnT C ° rreIatl0n f b ® tween the data, these variances might 
be taken as measures of that correlation. If there was no corre¬ 
lation, however, sample data might still yield values for these 
variances that would be merely the result of chance. Whether 
there was correlation or not, variation around the line or plane 
_ 6 ° re Sression or around the progression of means could 

of 3 IX ° f the A PP endix that this point is in the neighborhood 

sssssss p "' 

2 See p. 21. 

/iol, 8 p e 396™’ J ' and A ' J ' r,UNCAN > Elementary Statistics and Applica- 
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be taken as a measure of the amount of variation caused by 

chance forces. . , r 

Accordingly, to determine whether variation accounted for 
by a line, plane, curve, or progression of means is due to correla- 
tion or is merely the result of chance, it may be compared with 
the variation around the line, plane, curve, or progression of 

means. More specifically, the null hypothesis may be set up that 

all variation is due to chance. Then an estimate of the popu a- 
tion variance based upon the variation of the points on the curve, 
line or plane or upon the variation of the mean values may be 
compared with a similar estimate based upon the variation 
around these measures of correlation. In all cases, the ratio of 

these two estimates will be distributed like the V distribution. 

cr 2 r 2 . ^ 

For a line of regression, for example, the quantity — is a 

maximum-likelihood, or unbiased, estimate of the population 
variance—assuming no correlation—based on ^variation along 

the line of regression; and the quantity —^ _ 2 "* ls a maximum- 

likelihood, or unbiased, estimate of the population variance 
based on variation around the line of regression. Since these 
two estimates have independent sampling distributions, their 
ratio has a sampling distribution of the form of the V distribution, 
with nj = 1 and n 2 = N - 2. For a plane of regression, the 
two maximum-likelihood estimates of the population variance 

are • ■ • an d and their ratio is dis¬ 

tributed like the F distribution with n x = k- landn 2 = N - k, 
where k is the number of regression statistics ai.n, 612.3 ■ • •: used 
in the regression equation. For a progression of mean? the 

t Ci (1 IJia) 

maximum-likelihood estimates are j and - N _ k ~ ’ ■ iuu 

their ratio is distributed like the V distribution.with n x == & - 1 
and n 2 = N - k, k being the number of means. Similar for¬ 
mulas apply to a curvilinear regression. The statistics that were 
used in Chap. XII to test the significance of the multiple corre¬ 
lation coefficient, the correlation ratio, and the correlation tndex 
will thus be recognized as the ratio of two maximum-likelihood 
estimates of the population variance—assuming no correlation— 

1 This cannot be proved here. 
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and the analysis there given will be recognised as being essen- 
tialty an analysis of variance. 

The test of linearity was also an analysis of variance, but the 
assumptions are somewhat different. In this case a linear 
correlation is assumed to exist,.but not a curvilinear correlation. 

fii-Vit/ ° f hlS assumption an estimate of the population 
first-mder variance is made from the variation of the means- 

estimale \ ^ A ^ ^ ^ ° f regression and another 

armmd th ® 7“ th ® Variation “ d the means-or 
inden A < CUrVe r regression - Th ese two estimates have 
^dependent sampling fluctuations, and their ratio is thus dis¬ 
tributed like the F distribution. More specifically, 

0 ?vh - <r?rf 2 ) 

T=2 ’ 


or 


(<#? 


12 


N - k 


a l r 12 


is a maximum-likelihood estimate of the population first-order 
variance based on the variation of the means or curve around the 

line of regression; and iNLuNli} or ff l0- ~ I\?) . 

N*~ k ; N — k ? 1S a m aximum- 

likelihood estimate of the population first-order variance based 
on the variation around the means or curve, k being the number 
o means or the number of regression statistics. The ratios of 
these two estimates, respectively, are distributed like the F 
lstnbution with Wi = k - 2 and n 2 = N - k. If the reader 

t0 Page 307, he Wil1 n0te that this ratio is the 
statistic that was used to test the linearity of any regression It 

variance. 6 ^ ^ ** “ an ° ther form of a « of 



CHAPTER XVIII 

THE PROBLEM OF NONNORMALITY 

Up to this point most of the discussion has been based upon 
the assumption that the sampled population is normally, dis¬ 
tributed. This would appear to be a serious restriction, inas¬ 
much as data that are not normally distributed are frequently 
encountered., It is the purpose of this chapter, therefore, to 
consider the effect of a departure from normality upon the 
sampling distributions of some of the more important statistics. 

EXACT SAMPLING DISTRIBUTIONS 

Exact sampling distributions have been worked out for certain 
statistics from particular nonnormal populations. 1 The popu¬ 
lations studied have been of all sorts, rectangular, triangular,'.' 
U-shaped populations, populations that, conform, to a type A 
Gram-Charlier series, populations that conform to a type III 
Pearsonian curve, and the like. The distributions derived were 
mainly for the mean and standard deviation. 

Exact Distributions of the Mean . The distribution of the mean 
for samples of size N from a rectangular population, y = i, 
x = 0 to x = a, has been found to consist of a series of N poly¬ 
nomials, each of degree A — 1 and applicable to a subintewal of 
length a/N. The curve is bell shaped and resembles the normal 
curve when A ^ 3. For A " 2, the curve reduces to an isosceles 
triangle. 2 It has also been found that the distribution of means 
from a Pearsonian type III population is a type III curve, 3 and 

1 This section is based primarily upon H. L. Rietz, “Topics in Sampling 
Theory,” Bulletin of the American Mathematical Society, Vol. 43 (1937), pp. 

209-230. 

2 Ibid., p. 219. 

3 C/. Church, A. E. R., “On the Means and Standard Deviations of 
Small Samples from any Population,” Biometrika , Vol. 18 (1926), pp. 321- 
394, and Craig, C. C., “Distribution of Means of Samples from a Type A 
Population,” Annals of Mathematical Statistics, VoL 2 (1931), pp. 99-101. 
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distributions of means have been worked out for samples from 
triangular and U-shaped populations. 1 

Integrations for distributions of means have been worked out 
for means of samples from the following populations: 2 

1. A rectangular population represented by 

f(x) = - (0 s a i a) 

(X 


2. A J-shaped population represented by the declining expo¬ 
nential curve 


fix) = ^ exp 


(0 ^ X < go) 


3. A positively skewed population represented by the Pearson- 
ian type III or x 2 type curve 

_I — x 

fix) = kx 2 e 2 (0 ^ x < <*>) 

4. A peaked population represented by the double declining 
exponential curve 

fix) = | exp 

for N = 2, 3, and 4 only. 

5. A triangular-shaped population represented by the two 
straight lines 

fix) = when 0 ^ x ^ - 

a 2i 

and 

4 a 

fix) = ~2 i a ~ x ) when ^ ^ x ^ a 

for A = 2 only. 

Exact Distributions of Standard Deviation and Variance . For 
samples of 2 from a rectangular population (represented by 
2/ = 1, a; = 0 to a; = 1) it has been found 3 that the distribution 
of a was f{a) — 4(1 — 2cr). The exact distribution has also been 

1 Cf. Rider, Paul, “On Small Samples from Certain Nonnormal Uni¬ 
verses,” Annals of Mathematical Statistics, Vol. 2 (1931), pp. 48-65. 

2 See Craig, A. T., “On the Distribution of Certain Statistics,” American 
Journal of Mathematics, Vol. 54 (1932), pp. 353-366. 

3 See Rider, op. cit. 


(—GO < X < GO ) 
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found for the standard deviation of samples of 3 from a rectangu- 
lar population. 1 

Studies of sampling experiments with a Pearsonian type III 
population suggest that the distribution of sample variances from 
such a population may be adequately described by a Pearsonian 
type VI distribution. 2 Studies have also been made of the 
moments of the sampling distribution of the variance with a view 
to obtaining Pearsonian curves to represent the sampling dis¬ 
tribution when the population itself is a Pearsonian type curve. 
By this analysis, light is shed on how the sampling distribution 
changes with changes in population type and changes in size of 
sample. 3 

Exact Distributions of the Statistic t = —^ ^ ~~ Studies 

<T 

have been made of the distribution of sample t’s from a rectangu¬ 
lar distribution. For samples of 2 the distribution of t was 

/(0 = for samples of 3 it was much more complicated. 

These studies indicated that the distribution of t was more 
peaked, i.e., had more cases near the middle and at the ends, 
when the population was rectangular than vdien it was normal A 
It has also been found that the distribution of t from a U-shaped 
population was similar to that for a rectangular population. 5 

In summarizing a recent account of sampling from nonnormal 
populations, Rietz concludes as follows: “Although the prospects 
of obtaining the exact distribution functions of such statistics as 
the standard deviation s or the 1 Student ’ ratio z for samples from 
a considerable variety of nonnormal populations do not seem 
promising, nevertheless, by the use of moments of moments, and 
experimental sampling, along with the exact determination of 

USe e Rietz, H. L., “Note on the Distribution of the Standard Deviation 
of Sets of Three Variates Drawn at Random from a Rectangular Popula¬ 
tion, } Biometrika, Vol. 23 (1931), pp. 424-426. : 

2 See Sophisteb, “Discussion of Small Samples Drawn from an Infinite 
Skew Population/ 7 Biometrika , Vol. 20A (1928), pp. 389-423. 

3 Le Roux, J. M., “A Study of the Distribution of the Variance in Small 
Samples, Biometrika, Vol. 23 (1931), pp. 134-190. 

IPerlo, V., “On the Distribution of Student’s Ratio for Samples of Three 
203 ^ 204 fr ° m a ' ReCtangular distribution,” Biometrika , Vol. 25 (1933), pp. 

5 See Rider, op, cit , 
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some distribution functions, significant contributions are being 
made toward an understanding of the probable nature of certain 
important features of the distributions in question.” 1 

The Use of Normal Theory for Nonnormal Populations. In 
cases in which the population is nonnormal, but the exact forms 
of the various sampling distributions are not known, it is quite a 
common practice to view the results obtained on the assumption 
that the population is normal as fairly good approximations to 
those that would be actually obtained from the nonnormal popu¬ 
lation. The extent to which this practice is justified will now be 
considered. 

When the sample is large, as has been demonstrated above the 
distribution of sample means will tend to be normal in form, 
with a mean equal to the mean of the population and a variance 
equal to the variance of the population divided by N* What¬ 
ever the nature of the population the moments of the distribu¬ 
tion of sample means are: 2 




_ V 2 

iV> "" N 


_ Vs 
xV3 2 


tf4 , 3(iV - 1) , \2 

xV4 = TvTs H-Sr3 W 2 ' 


JV S ' N* 

from which it follows that 

5i 


:Si = 


N 


and 


ih ~ 3 = 


- 3 


N 


( 1 ) 

(2) 


These suggest a fairly rapid approach to normality in general, 
while experiments in sampling from nonnormal populations that 
have only a moderate degree of skewness and kurtosis—for 
example, when $1 £ .2 and $2 - 3 ^ .4—suggest that the sam¬ 
pling distribution of the mean approaches normality very 
rapidly. This appears to be true, for example, of samples as 
small as 10. f Similar statements can be made for a regression 
coefficient, the difference between two means, and other statistics 
that, like the mean, are linear functions of the sample data. It 

1 Rietz, H. L., “Topics in Sampling Theory,” Bulletin of the American 
Mathematical Society , Yol. 43 (1937), p. 230. 

* See pp. 262-263. 

means y 2 for X, yy 3 means y 3 for X, etc. Similarly, *§1 means 

(5, for X, etc. 

t Cf. Church, op . cit. 



449 


THE PROBLEM OF NONNORMALITY 


is assumed in all these cases that the variance of the population 
is known. _ 

The Statistic't = When the variance of the 

population is not known, it will be recalled that fluctuations m 
the mean are analyzed through fluctuations m the statistic 

\/N (ff - X) A num ber of studies have been made of the 

sampling fluctuations in this statistic when the population is not 
normal. 1 When the population was rectangular or U-shaped, 1 
was found that the distribution of t failed to agree with the dis¬ 
tribution of t for samples from normal populations, especially 
near the middle and at the ends of the distributionIn genera 
it has been found that the agreement between t for nonnorma 
samples and t for normal samples is poorest when the nonnorma 
populations are very peaked.* On the other hand, m^the easy of 
skew triangular populations it was found that, although 
distribution of t was skewed, when the cumulative probability 
was considered (*.«., when the two tail areas were added) the 
results were about the same as for a normal population This 
suggests that in the case of skew populations, at least, the use 
of the normal theory may not give bad approximations to the 
correct results. The danger of error apparently is greatest m 

very peaked populations. . .. 

Analysis of Variance. Although the foregoing conclusions are 
based on experiments relating to sample means and regression 
coefficients, they'apply equally well to analyses of variance in 
which «i = 1, the case when the sampling distribution of F, 
or more precisely VF, reduces to the t distribution or hal^he 
t distribution because y/F may have no sign. That sue 
conclusions can be extended to other analyses of yanance 
requires additional evidence. This has already been pro- 


i See references for section above, pp. 446-447. Simoles- 

' *- Also see Shewhabt, W. A., and Winters, F. W., Small Samples 

New Experimental Results,” Journal of the American Statistical Association, 

Symmetrical and Skew Populations/’ Biometnka, Yol. 21 (1929), pp. 


4 Goo Pmup 
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ided m part by other studies.' From a study of empirical 
J. a . P f r a “ l “ S to a set of means grouped according to one 
p mcipie of classification, Egon S. Pearson concludes that an 
abalysis based on the assumption of normality gives fairly satis¬ 
factory results even when the population is not normal. This he 
points out, arises from the correlation that may exist between 

when tb 10n w meanS and the Varfation about the “eans 

wnen the population is nonnormal. 

Variances General equations have also been derived for the 
momen s and /3-coefiicients of the sampling distribution of the 
variance. These also suggest that the sampling distribution of 
the variance tends toward the normal form as N increases The 
approach to normality is not very rapid, however, and it is not 
safe to assume that the distribution of the variance is normal 
unless the sample is quite large. Furthermore, experimental 
evi ence indicates that for smaller samples from certain types of 
onnormal populations the sampling distribution of the variance 
does not necessarily approximate the x 2 * 4 distribution, which is 

According^ it has been concluded by students of the subject 
that we are liable to go far wrong when using the ‘normal 
theory vanance distribution to represent i>(s 2 ) [i.e. the sam¬ 
pling distribution of the variance] in samples from nonnorZl 
parent populations ”• There is some doubt also as 

l the / de TT ^ distribution f or testing the ratio of 
two independent sample variances from nonnormal populations.' 

1 Peabson, Egon S., “Analysis of Variance in Cases of Nonnormal Varia 

tion,” Biometnka, Vol. 23 (1931), pp. 114-133. "normal Vana- 

2 These are conveniently summarized by LeRoux, op. cit 

Ihd., p. 189 The use of the moments of the sampling distribution of 
the variance to describe its general form is based on the assumption that it 
can be represented by a Pearsonian frequency curve G A 1 

attacked the problem from another angle. He has derived a general for’mul'a 
S . amplm ® distribution of the variance on the assumption that the 

Thri formula is e 6 P ° PUlati °\ Can be "Rented by a Gram-Charlier series, 
his formula is expressed in terms of the constants of the Gram-Charlier 

the population and the number of terms IfThe serieT 
Of. INote on the Distribution of the Standard Deviations and Second 
Moments of Samples from a Gram-Charlier Population,” Annals of Mathe 
mahcal Statutes, Vol. 2 (1931), pp. 48-65. Little use has yet belnmX 
of this approach to the problem. y de 

4 Cf. Pearson, Biometrika, Vol. 23 (1931), pp. 114-133. 
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Correlation Coefficients. The distribution of the correlation 
coefficient of samples from nonnormal populations has been 
studied primarily by sampling experiments. When the popula¬ 
tion correlation coefficient is zero, these experiments suggest that 
the distribution of the correlation coefficient has approximately 
the same form as when the population is normal. 1 

When the population correlation coefficient is not zero, on 
the other hand, and when the samples are small, the agreement 
between the experimental distributions derived from nonnormal 
populations and the theoretical distribution for a normal popula¬ 
tion does not appear to be so good, although the closeness of the 
agreement improves as the size of the sample increases. 2 These 
conclusions are based on a very limited set of data and can at the 
post be considered as tentative only. 

INDIRECT ATTACKS ON THE PROBLEM OF NONNORMALITY 

Several indirect attacks on the problem of nonnormality have 
been attempted. One of these is the transformation of the 
original data. It has long been recognized, for example, that in 
certain cases the logarithms of given data are normally dis¬ 
tributed although the data themselves are not. In such cases, 
the problem of nonnormality may readily be solved by a logarith¬ 
mic transformation. A general transformation by which any 
nonnormal distribution might be made approximately normal 
has been proposed, and it has been suggested that the sampling 
distributions of estimates of the parameters of the nonnormal 
distributions might be expressed in terms of the transformation 
and the sampling distributions of estimates of the parameters 
of the normal distribution into which the nonnormal distribution 
has been transformed. 3 This line of attack, however, has not 
as yet been developed to a point where it can be put to practical 
use. 

Another line of attack has turned its attention to the qualita¬ 
tive or semiqualitative aspects of the data and by thus ignoring 

1 Pearson, Egon S., “The Test of Significance for the Correlation Coeffi¬ 
cient,” Journal of the American Statistical Association , Vol. 26 ('1931') nn 
128-134. ’ 11 * 

^ 2 Cheshire, Leone, Elena Oldis, and Egon S. Pearson, “Further 
Experiments on the Sampling Distribution of the Correlation Coefficient,” 
Journal of the American Statistical Association Vol. 27 (1932), pp. 121-128. 

3 Cf. Annals of Mathematical Statistics, Vol. 3 (1932), pp. 113-123. 
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some of the quantitative aspects has avoided the necessity of 
special assumptions as to the form of the population. Instead 
of determining, for example, whether two samples are from the 
same population by testing the difference between their means, 
variances, etc., it would be possible to group the data into classes 
and to apply a x 2 test of independence. A method involving no 
assumption as to population form would thus be substituted for 
one necessitating such an assumption. It is to be noted, how¬ 
ever, that this qualitative method is not as efficient as the 
quantitative method, since a certain amount of the information 
provided by the data is ignored. If the population appears to be 
normal, therefore, the more refined methods based upon normal 
assumptions are to be preferred. The qualitative method 
requires also that sufficient data be available to permit grouping 
to test for independence. 

For large samples a semiqualitative method is available for 
testing the existence of correlation. If in correlating Xi and X 2 , 
for example, only the rank or order of size of the variables is 
considered, it is possible to compute a coefficient of “rank 
correlation ” whose significance may be tested without any 
assumption as to the form of the XiX 2 population. If * repre¬ 
sents the difference in rank between sample pairs of Xi and X 2 
values, then the coefficient of rank correlation is measured by 


For a population rank correlation of zero this has a sampling 
distribution that is approximately normal in form. The mean 
of this sampling distribution is zero, and its standard deviation 
is <j r , - l/y/N - I. The approximation is valid only for large 
samples. For a discussion of this procedure the student is 
referred to the work of Hotelling and Pabst. 1 

It may be noted, however, that the discarding of certain 
quantitative information again reduces the efficiency of the 
test, although it is contended that the loss of efficiency is not 
very great—not more than, say, 9 per cent. It is also to be 
noted that this method merely affords a test of the existence of 
correlation. It does not offer a substitute for the sampling dis- 

1 “Rank Correlation and Tests of Significance Involving No Assumption 
of Normality,” Annals of Mathematical Statistics, Vol. 7 (1936), pp. 29-43. 
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tribution of the Pearsonian correlation coefficient when it is to 
be assumed that the population correlation coefficient is other 

than zero. . , . , ,. 

It has also been suggested that the principle of rank correlation 
may be used to solve the problem of nonnormality in -the analysis 
of variance. 1 This would involve the replacement of the actual 
value of a case by its rank in the conventional analysis of variance 
table. With such a plan, the efficiency of the test is again 
reduced in comparison with the ordinary analysis of variance 
applied to a sample from, a normal population. This may be 
offset by the inaccuracy introduced when the usual test is applied 
to nonnormal populations. A great advantage of the rank 
method of handling analysis of variance is that it reduces greatly 
the amount of arithmetical computation involved in the test, 
since it converts all figures into simple small integers. 

TCHfiBYCHEF’S AND OTHER INEQUALITIES 

An interesting attack on the problem of nonnormality has been 
through the establishment of upper limits of probability for 
specified sampling fluctuations that are valid for almost any 
type of sampling distribution. The best known of these attempts 
is Tchebychef s inequality. This says that the probability of a 
variable deviating from its mean by as much as \d is equal to or 
less than 1 /X 2 For example, if the mean of a variable is 100 and 
its standard deviation is 10, the probability that the variable will 
deviate from 100 by as much as 3d is equal to or loss than 9 -11. 

The basis for this conclusion is as follows. By definition, 

= P1 (X, - X) 2 + p 2 (X 2 - X) 2 + ■ • • + - X) 2 (3) 

Let X' X", X'", ... and p', p", p"' , ■ • • represent those 
deviations and their probabilities that differ from X by as much 
as Xd» Then, since the part is less than or at the most equal to 

the whole, 

d 2 A p'(X' - X ) 2 + p"(X" - X) 2 + • • • (4) 

or if (Ad) 2 is put in place of (X' - X) 2 , (X" - X) 2 , . . . which 
are equal to or greater than (Ad) 2 , 

1 Friedman, Milton, “The Use of Ranks to Avoid the Assumption of 
Normality/’ Journal of the American Statistical Association, Vol. 32 (1937), 
pp. 675-701. 
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and 


d 1 2 ^ p'(uy + P "(xd ) 2 + • • * 

d 2 ^ {p r + p" + • • • )(\d) 2 (5) 

-Pxd ^ ^ (6) 


where P^ - p f + p" + . . .is the probability of a deviation 
equal to or greater than Xd. 1 

The problem of closer inequalities has been dealt with in recent 
papers by several mathematicians. Camp, Guldberg, Meidel, and 
Narumi have succeeded particularly well by placing certain mild 
restrictions on the nature of the population function F(x). The 
restrictions are of such a nature as to leave the distribution 
function sufficiently general to be useful in the actual problems 
of statistics. The main restriction placed on F{x) by Camp is 
that it is to be a monotonic decreasing function of \x\ when 
M ^c,c^ 0. The general effect of this restriction is to exclude 
distributions that are not represented bjr decreasing functions of 
M at points more than a certain assigned distance from the 
origin. 

With the origin so chosen that zero is at the mean, Camp 
reaches the general inequality 2 


where 


p . (?2s—2 

P xd = T 27 " 


\2s + \) 


1 + 


+ 


1 + <P 


\*2s 


and 


(c 2s V s 

V x 2s + 1 / 

(H 


(2s + 1) 


(7) 


1 Cf. Rietz, Henry Lewis, Mathematical Statistics (1936), pp. 28-30, 
140-144; and Arne Fisher, The Mathematical Theory of Probabilities (1922), 
pp. 108-109, 115-116. The original work on the subject is in Jules Bien- 
aym6, 11 Considerations a l’appui de la decouverte de Laplace sur la loi de 
probability dans la m5thode des moindres carrys,” Comptes Rendus des 
stances de VAcadtmie des Sciences, Vol. 37 (1853), pp. 309-324; P. L. De 
Tchebychef,. “Des Valeurs moyennes.” Traduction du Russe par 
N. de Khanikof, Journal de mathematiques pares et appliquees, Deuxieme 
Sene, Vol. 12 (1867), pp. 177-184; and P. Pizzetti, “I fondamenti matematici 
per la critica dei risultati sperimentali,” Atti della University di Genova (1892) 
pp. 113-333. 

2 Rietz, op. cit., pp. 140-144. 
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and P c d is the probability of a deviation equal to or greater than cd. 
If c is 0, this reduces to 


Pxd ^ 


5_2.- 2 / 2s V s 

\2s + 1/ 


( 8 ) 


The difficulty with the use of either Tchebychefs formula or 
its extensions is that their use usually depends on knowledge of 
certain population parameters other than those for which hypoth¬ 
eses are being tested. For example, the variance of the 
sampling distribution of variances from any population equals 
approximately 

£(?»- 1 ) (») 


From this it is evident that, to test a hypothesis regarding the 
population standard deviation, knowledge must be had of the 
population ($ 2 . This greatly reduces the usefulness of inequalities 
of this kind, since only limited knowledge regarding (3 2 may be 
available. All that can validly be done in cases like this is to 
test various joint hypotheses regarding the population variance 
and the population (5 2 . 
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Table I.—Four-place.. Common Logarithms of Numbers 1 



0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Tenths of the 
Tabular 
Difference 

1 2 3 4 5 

1.0 

0.0000 

0043 

0086 

0128 

0170 

0212 

0253 

0294 

0334 

0374 

0414 






1.1 

0414 

0453 

0492 

0531 

0569 

0607 

0645 

0682 

0719 

0755 

0792 






1.2 

0792 

0828 

0864 

0899 

0934 

0969 

1004 

1038 

1072 

1106 

1139 






1.3 

1139 

1173 

1206 

1239 

1271 

1303 

1335 

1367 

1399 

1430 

1461 






1.4 

1461 

1492 

1523 

1553 

1584 

1614 

1644 

1673 

1703 

1732 

1761 






1.5 

1761 

1790 

1818 

1847 

1875 

1903 

1931 

1959 

1987 

2014 

2041 






1.6 

2041 

2068 

2095 

2122 

2148 

2175 

2201 

2227 

2253 

2279 

2304 






1.7 

2304 

2330 

2355 

2380 

2405 

2430 

2455 

2480 

2504 

2529 

2553 






1.8 

2553 

2577 

2601 

2625 

2648 

2672 

2695 

2718 

2742 

2765 

2788 






1.9 

2788 

2810 

2833 

2856 

2878 

2900 

2923 

2945 

2967 

2989 

3010 






2.0 

0.3010 

3032 

3054 

3075 

3096 

3118 

3139 

3160 

3181 

3201 

3222 

i 

2 

4 6 

8 

11 

2.1 

3222 

3243 

3263 

3284 

3304 

3324 

3345 

3365 

3385 

3404 

3424 

2 

4 

6 

8 

10 

2.2 

3424 

3444 

3464 

3483 

3502 

3522 

3541 

3560 

3579 

3598 

3617 

2 

4 

6 

8 

10 

2.3 

3617 

3636 

3655 

3674 

3692 

3711 

3729 

3747 

3766 

3784 

3802 

2 

4 

5 

7 

9 

2.4 

3802 

3820 

3838 

3856 

3874 

3892 

3909 

3927 

3945 

3962 

3979 

2 

4 

5 

7 

9 

2.5 

3979 

3997 

4014 

4031 

4048 

4065 

4082 

4099 

4116 

4133 

4150 

2 

3 

5 

7 

9 

2.6 

4150 

4166 

4183 

4200 

4216 

4232 

4249 

4265 

4281 

4298 

4314 

2 

3 

5 

7 

8 

2.7 

4314 

4330 

4346 

4362 

4378 

4393 

4409 

4425 

4440 

4456 

4472 

2 

3 

5 

6 

8 

2.8 

4472 

4487 

4502 

4518 

4533 

4548 

4564 

4579 

4594 

4609 

4624 

2 

3 

5 

6 

8 

2.9 

4624 

4639 

4654 

4669 

4683 

4698 

4713 

4728 

4742 

4757 

4771 

1 

3 

4 

6 

7 

3.0 

0.4771 

4786 

4800 

4814 

4829 

4843 

4857 

4871 

4886 

4900 

4914 

1 

3 

4 

6 

7 

3.1 

4914 

4928 

4942 

4955 

4969 

4983 

4997 

5011 

5024 

5038 

5051 

1 

3 

4 

6 

7 

3.2 

5051 

5065 

5079 

5092 

5105 

5119 

5132 

5145 

5159 

5172 

5185 

1 

3 

4 

5 

7 

3.3 

5185 

5198 

5211 

5224 

5237 

5250 

5263 

5276! 

5289 

5302 

5315: 

1 

3 

4 

5 

6 

3.4 

5315 

%328 

5340 

5353 

5366 

5378 

5391 

5403 

5416 

5428 

5441 

1 

3 

4 

5 

6 

3.5 

5441 

5453 

5465 

' 5478 

5490 

i 

5502 

5514 

5527 

5539 

5551 

5563 

1 

2 

4 

5 

6 

3.6 

5563 

5575 

5587 

; 5599 

5611 

5623 

5635 

5647! 

5658! 

5670 

5682 

1 

2 

4 

5 

6 

3.7 

5682 

5694 

5705 

1 5717 

5729 

5740 

5752 

5763' 

5775! 

5786 

5798 

1 

2 

3 

5 

6 

3.8 

5798 

5809 

5821 

! 5832 

5843 

5855 

5866 

5877 

5888 

5899 

5911 

1 

2 

3 

5 

6 

3.9 

5911 

5922 

5933 

5944 

5955 

5966 

5977 

5988 

5999 

6010 

6021 

1 

2 

3 

4 

6 

4.0 

0.6021 

6031 

6042 

6053 

' 6064 

! 6075 

6085 

6096 

6107 

6117 

6128 

1 

2 

3 

4 

5 

4.1 

6128 

6138 

6149 

6160 

; 6170 

6180 

6191 

6201 

6212 

6222 

6232 

1 

2 

3 

4 

5 

4.2 

6232 

6243 

6253 

6263 

: 6274 

6284 

6294 

6304 

6314 

6325 

6335 

1 

2 

3 

4 

5 

4.3 

6335 

6345 

6355 

6365 

! 6375 

6385 

6395 

6405 

6415 

6425 

6435 

1 

2 3 

4 

5 

4.4 

6435 

6444 

6454 

6464 

6474 

6484 

6493 

6503 

6513 

6522 

6532 

1 

2 

3 

4 

5 

4.5 

6532 

6542 

6551 

65611 

6571 

6580 

6590 

6599 

6609 

6618 

6628 

1 

2 

3 

4 

5 

4.6 

6628 

6637 

6646 

6656 

6665! 

6675 

6684 

6693 

6702 

6712 

6721 

1 

2 

3 

4 

5 

4.7 

6721 

6730 

6739 

6749! 

6758 

6767 

6776 

6785 

6794 

6803 

6812 

1 

2 

3 

4 

5 

4.8 

6812 

6821 

6830 

6839 

6848 

6857 

6866 

6875 

6884 

6893 

6902 

1 

2 

3 

4 

4 

4.9 

6902 

6911 

6920 

6928 

6937 

6946 

6955 

6964 

6972 

6981 

6990 

1 

2 

3 

4 

4 

5.0 

0.6990 

6998 

7007 

7016 

7024 

7033 

7042 

7050 

7059 

7067 

7076 

1 

2 

3 

3 

4 

5.1 

7076 

7084 

7093 

7101 

7110 

7118 

7126 

7135 

7143 

7152 

7160 

1 

2 

3 

3 

4 

5.2 

7160 

7168 

7177 

7185 

7193 

7202 

7210 

7218 

7226 

7235 

7243 

1 

2 

2 

3 

.4. 

5.3 

7243 

7251 

7259 

7267 

7275 

7284 

7292 

7300 

7308 

7316 

7324 

1 

2 

2 

3 

4 

5.4 

7324 

7332 

7340 

7348 

7356 

7364 

7372 

7380 

7388 

7396 

7404 

1 

2 

2 

3 

4 

K ' 


1 Taken, with, permission, from E. V. Huntington’s Four Place Tables of Logarithms cmd 
Trigonometric Functions (Harvard Cooperative Society, Inc., 1907). 
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Table I.—Four-place Common Logarithms of Numbers.— 
( Continued ) 



0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Tenths of the 
Tabular 
Difference 

1 2 3 4 5 

6.5 

7404 

7412 

7419 

7427 

7435 

7443 

7451 

7459 

7466 

7474 

7482 

1 

2 

2 

3 

4 

5.6 

7482 

7490 

7497 

7505 

7513 

7520 

7528 

7536 

7543 

7551 

7559 

1 

2 

2 

3 

4 

5.7 

7559 

7566 

7574 

7582 

7589 

7597 

7604 

7612 

7619 

7627 

7634 

1 

2 

2 

3 

4 

5.8 

7634 

7642 

7649 

7657 

7664 

7672 

7679 

7686 

7694 

7701 

7709 

1 

1 

2 

3 

4 

6.9 

7709 

7716 

7723 

7731 

7738 

7745 

7752 

7760 

7767 

7774 

7782 

1 

1 

2 

3 

4 

6.0 

0.7782 

7789 

7796 

7803 

7810 

7818 

7825 

7832 

7839 

7846 

7853 

1 

1 

2 

3 4 

6.1 

7853 

7860 

7868 

7875 

7882 

7889 

7896 

7903 

7910 

7917 

7924 

1 

1 

2 

3 

4 

6.2 

7924 

7931 

7938 

7945 

7952 

7959 

7966 

7973 

7980 

7987 

7993 

1 

1 

2 

3 

3 

6.3 

7993 

8000 

8007 

8014 

8021 

8028 

8035 

8041 

8048 

8055 

8062 

1 

1 

2 

3 

3 

6.4 

8062 

8069 

8075 

8082 

8089 

8096 

8102 

8109 

8116 

8122 

8129 

1 

1 

2 

3 

3 

6.5 

8129 

8136 

8142 

8149 

8156 

8162 

8169 

8176 

8182 

8189 

8195 

1 

1 

2 

3 

3 

6.6 

8195 

8202 

8209 

8215 

8222 

8228 

8235 

8241 

8248 

8254 

8261 

1 

1 

2 

3 

3 

6.7 

8261 

8267 

8274 

8280 

8287 

8293 

8299 

8306 

8312 

8319 

8325 

1 

1 

2 

3 

3 

6.8 

8325 

8331 

8338 

8344 

8351 

8357 

8363 

8370 

8376 

8382 

8388 

1 

1 

2 

3 

3 

6.9 

8388 

8395 

8401 

8407 

8414 

8420 

8426 

8432 

8439 

8445 

8451 

1 

1 

2 

3 3 

7.0 

0.8451 

8457 

8463 

8470 

8476 

8482 

8488 

8494 

8500 

8506 

8513 

1 

1 

2 

2 3 

7.1 

8513 

8519 

8525 

8531 

8537 

8543 

8549 

8555 

8561 

8567 

8573 

1 

1 

2 

2 

3 

7.2 

8573 

8579 

8585 

8591 

8597 

8603 

8609 

8615 

8621 

8627 

8633 

1 

1 

2 

2 

3 

7.3 

8633 

8639 

8645 

8651 

8657 

8663 

8669 

8675 

8681 

8686 

8692 

1 

1 

2 

2 

3 

7.4 

8692 

8698 

8704 

8710 

8716 

8722 

8727 

8733 

8739 

8745 

8751 

1 

1 

2 

2 

3 

7.6 

8751 

8756 

8762 

8768 

8774 

8779 

^ 8785 

8791 

8797 

8802 

8808 

1 

1 

2 

2 

3 

7.6 

8808 

8814 

8820 

8825 

8831 

8837 

8842 

8848 

8854 

8859 

8865 

1 

1 

2 

2 

3 

7.7 

8865 

8871 

8876 

8882 

8887 

8893 

8899 

8904 

8910 

8915 

8921 

1 

1 

2 

2 

3 

7.8 

8921 

8927 

8932 

8938 

8943 

8949 

8954 

8960 

8965 

8971 

8976 

1 

1 

2 

2 

3 

7.9 

8976 

8982 

8987 

8993 

8998 

. 9004 

9009 

9015 

9020 

9025 

9031 

1 

1 

2 

2 

3 

8.0 

0.9031 

9036 

9042 

9047 

9053 

! 9058 

■ 9063 

9069 

9074 

9079 

9085 

1 

1 

2 

2 

3 

8.1 

9085 

9090 

9096 

9101 

9106 

9112 

9117 

9122 

9128 

9133 

9138 

1 

1 

2 

2 

3 

8.2 

9138 

9143 

9149 

9154 

9159 

9165 

9170 

9175 

9180 

9186 

9191 

1 

1 

2 

2 

3 

8.3 

9191 

9196 

9201 

9206 

9212 

9217 

9222 

9227 

9232 

9238 

9243 

1 

1 

2 

2 3 

8.4 

9243 

9248 

9253 

9258 

9263 

9269 

9274 

9279 

9284 

9289 

9294 

1 

1 

2 

2 

3 

8.5 

9294 

9299 

9304 

9309 

9315 

9320 

9325 

9330 

9335 

9340 

9345 

' 1 

1 

2 

2 

3 

8.6 

9345 

9350 

9355 

9360 

9365 

9370 

9375 

9380 

9385 

9390 

9395 

1 

1 

2 

2 

3 

8.7 

9395 

9400 

9405 

9410 

9415 

9420 

9425 

9430 

9435 

9440 

9445 

0 

1 

1 

2 

2 

8.8 

9445 

9450 

9455 

9460 

9465 

9469 

9474 

9479' 

9484 

9489 

9494 

0 

1 

1 

2 

2 

8.9 

9494 

9499 

9504 

'9509 

9513 

9518 

9523 

9528 

9533 

9538 

9542 

0 

1 

1 

2 2 

9.0 

0.9542 

9547 

9552 

9557 

9562 

9566 

9571 

9576 

9581 

9586 

9590 

0 

1 

1 

2 

2 

9.1 

9590 

9595 

9600 

9605 

9609 

9614 

9619 

9624 

9628 

9633 

9638 

0 

1 

1 

2 

2 

9.2 

9638 

9643 

9647 

9652 

9657 

9661 

9666 

9671 

9675 

9680 

9685 

0 

1 

1 

2 

2 

9.3 

9685 

9689 

9694 

9699 

9703 

9708 

9713 

9717 

9722 

9727 

9731 

0 

1 

1 

2 

2 

9.4 

9731 

9736 

9741 

9745 

9750 

9754 

9759 

9763 

9768 

9773 

9777 

0 

1 

1 

2 

2 

9.5 

9777 

9782 

9786 

9791 

9795 

9800 

9805 

9809 

9814 

9818 

9823 

0 

1 

1 

2 

2 

9.6 

9823 

9827 

9832 

9836 

9841 

9845 

9850 

9854 

9859 

9863 

9868 

0 

1 

1 

2 

2 

9.7 

9868 

9872 

9877 

9881 

9886 

9890 

9894 

9899 

9903 

9908 

9912 

0 

1 

1 

2 

2 

9.8 

9912 

9917 

9921 

9926 

9930 

9934 

9939 

' 9943 

9948 

9952 

9956 

0 

1 

1 

2 

2 

9.9 

9956 

9961 

9965 

9969 

9974 

9978 

9983 

9987 

9991 

9996 


0 

1 

1 

2 

2 
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APPENDIX 

Table I.—Four-place Common Logarithms of Numbers.— 
(■ Continued) 



0 

1 1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1.00 

1.01 

1.02 

1.03 

1.04 

0.0000 

0043 

0086 

0128 

0170 

0004 

0048 

0090 

0133 

0175 

0009 

0052 

0095 

0137 

0179 

0013 

0056 

0099 

0141 

0183 

0017 

0060 

0103 

0145 

0187 

0022 

0065 

0107 

0149 

0191 

0026 

0069 

0111 

0154 

0195 

0030 

0073 

0116 

0158 

0199 

0035 

0077 

0120 

0162 

0204 

0039 

0082 

0124 

0166 

0208 

0043 

0086 

0128 

0170 

0212 

1.05 

1.06 

1.07 

1.08 

1.09 

0212 

0253 

0294 

0334 

0374 

0216 

0257 

0298 

0338 

0378 

0220 

0261 

0302 

0342 

0382 

0224 

0265 

0306 

0346 

0386 

0228 

0269 

0310 

0350 

0390 

0233 

0273 

0314 

0354 

0394 

0237 

0278 

0318 

0358 

0398 

0241 

0282 

0322 

0362 

0402 

0245 

0286 

0326 

0366 

0406 

0249 

0290 

0330 

0370 

0410 

0253 

0294 

0334 

0374 

0414 

1.10 
1.11 
1.12 
1.13 
: 1.14 

0.0414 

0453 

0492 

0531 

0569 

0418 

0457 

0496 

0535 

0573 

0422 

0461 

0500 

0538 

0577 

0426 

0465 

0504 

0542 

0580 

0430 

0469 

0508 

0546 

0584 

0434 

0473 

0512 

0550 

0588 

0438 

0477 

0515 

0554 

0592 

0441 

0481 

0519 

0558 

0596 

0445 

0484 

0523 

0561 

0599 

0449 

0488 

0527 

0565 

0603 

0453 

0492 

0531 

0569 

0607 

1 1.15 
; 1.16 
: 1.17 
i 1.18 
; 1.19 

0607 

0645 

0682 

0719 

0755 

0611 

0648 

0686 

0722 

0759 

0615 

0652 

0689 

0726 

0763 

0618 

0656 

0693 

0730 

0766 

0622 

0660 

0697 

0734 

0770 

0626 

0663 

0700 

0737 

0774 

0630 

0667 

0704 

0741 

0777 

0633 

0671 

0708 

0745 

0781 

0637 

0674 

0711 

0748 

0785 

0641 
0678 
0715 
0752 
| 0788 

0645 
0682 
0719 , 
0755 I 
0792 

1.20 

1.21 

1.22 

1.23 

1.24 

0.0792 

0828 

0864 

0899 

0934 

0795 

0831 

0867 

0903 

0938 

0799 
0835 
, 0871 
0906 
0941 

0803 
0839 
0874 
0910 ! 
0945 

0806 

0842 

0878 

0913 

0948 

0810 

0846 

0881 

0917 

0952 

0813 

0849 

0885 

0920 

0955 

0817 

0853 

0888 

0924 

0959 

0821 

0856 

0892 

0927 

0962 

0824 

0860 

0896 

0931 

0966 

0828 

0864 

0899 

0934 

0969 

1.25 

1.26 

1.27 

1.28 
1.29 

0969 

1004 

1038 

1072 

1106 

0973 

1007 

1041 

1075 

1109 

0976 

1011 

1045 

1079 

1113 

0980 

1014 

1048 

1082 

1116 

0983 
1017 
1052 
1086 
11L9 

0986 

1021 

1055 

1089 

1123 

0990 

1024 

1059 

1092 

1126 

0993 

1028 

1062 

1096 

1129 

0997 

1031 

1065 

1099 

1133 

1000 

1035 

1069 

1103 

1136 

1004 

1038 

1072 

1106 

1139 

1.30 

1.31 

1.32 

1.33 

1.34 

0.1139 

1173 

1206 

1239 

1271 

1143 

1176 

1209 

1242 

1274 

1146 

1179 

1212 

1245 

1278 

1149 

1183 

1216 

1248 

1281 

1153 

1186 

1219 

1252 

1284 

1156 

1189 

1222 

1255 

1287 

1159 

1193 

1225 

1258 

1290 

1163 

1196 

1229 

1261 

1294 

1166 

1199 

1232 

1265 

1297 

1169 

1202 

1235 

1268 

1300 

1173 

1206 

1239 

1271 

1303 

1.35 

1.36 

1.37 

1.38 

1.39 

1303 

1335 

1367 

1399 

1430 

1307 

1339 

1370 

1402 

1433 

1310 

1342 

1374 

1405 

1436 

1313 

1345 

1377 

1408 

1440 

1316 

1348 

1380 

1411 

1443 

1319 

1351 

1383 

1414 

1446 

1323 

1355 

1386 

1418 

1449 

1326 

1358 

1389 

1421 

1452 

1329 

1361 

1392 

1424 

1455 

1332 

1364 

1396 

1427 

1458 

1335 

1367 

1399 

1430 

1461 

1.40 

1.41 

1.42 

1.43 

1.44 

0.1461 

1492 

1523 

1553 

1584 

1464 

1495 

1526 

1556 

1587 

1467 

1498 

1529 

1559 

1590 

1471 

1501 

1532 

1562 

1593 

1474 

15Q4 

1535 

1565 

1696 

1477 

1508 

1538 

1569 

1599 

1480 

1611 

1541 

1572 

1602 

1483 

1514 

1544 

1575 

1605 

I486 

1517 

1547 

1578 

1608 

1489 

1520 

1550 

1581 

1611 

1492 

1523 

1553 

1584 

1614 

1.4E 

1.46 

1.47 
1.4? 

1.4t 

; 1614 

i 1644 

' 1673 

l 1703 

) 1732 

1617 

1647 

1676 

1706 

1735 

1620 

1649 

1679 

1708 

1738 

1623 

1652 

1682 

1711 

1741 

1626 

1655 

1685 

1714 

1744 

1629 

1C58 

1608 

1717 

1746 

1632 

1661 

1691 

1720 

1749 

1635 

1664 

1694 

1723 

1752 

1638 

1667 

1697 

1726 

1755 

1641 

1670 

1700 

1729 

1758 

1644 

1673 

1703 

1732 

1761 
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Table I. Four-place Common Logarithms of Numbers.— 

(Continued) 



0 

1 

1 S 

3 

4 

5 

6 

7 

8 

9 

10 

1.5 G 

• 0.1761 

1764 

1767 

1770 

1772 

1775 

1778 

1781 

1784 

1787 

1790 

1.51 

1790 

1793 

1796 

1798 

1801 

1804 

1807 

1810 

1813 

1816 

1818 

1.52 

! 1818 

1821 

1824 

1827 

1830 

1833 

1836 

1838 

1841 

1844 

1847 

1.5 a 

1847 

1850 

1853 

1855 

1858 

1861 

1864 

1867 

1870 

1872 

1875 

1.54 

1875 

1878 

1881 

1884 

1886 

1889 

1892 

1895 

1898 

1901 

1903 

1.55 

1903 

1906 

1909 

1912 

1915 

1917 

1920 

1923 

1926 

1928 

1931 

1.56 

1931 

1934 

1937 

1940 

1942 

1945 

1948 

1951 

1953 

1956 

1959 

1.57 

1959 

1962 

1965 

1967 

1970 

1973 

1976 

1978 

1981 

1984 

1987 

1.58 

1987 

1989 

1992 

1995 

1998 

2000 

2003 

2006 

2009 

2011 

2014 

1.59 

2014 

2017 

2019 

2022 

2025 

2028 

2030 

2033 

2036 

2038 

2041 

1.60 

0.2041 

2044 

2047 

2049 

2052 

2055 

2057 

2060 

2063 

2066 

2068 

1.61 

2068 

2071 

2074 

2076 

2079 

2082 

2084 

2087 

2090 

2092 

2095 

1.62 

2095 

2098 

2101 

2103 

2106 

2109 

2111 

2114 

2117 

2119 

2122 

1.63 

2122 

2125 

2127 

2130 

2133 

2135 

2138 

2140 

2143 

2146 

2148 

1.64 

2148 

2151 

2154 

2156 

2159 

2162 

2164 

2167 

2170 

2172 

2175 

1.65 

2175 

2177 

2180 

2183 

2185 

2188 

2191 

2193 

2196 

2198 

2201 

1.66 

2201 

2204 

2206 

2209 

2212 

2214 

2217 

2219 

2222 

2225 

2227 

1.67 

2227 

2230 

2232 

2235 

2238 

2240 

2243 

2245 

2243 

2251 

2253 

1.68 

2253 

2256 

2258 

2261 

2263 

2266 

2269 

2271 

2274 

2276 

2279 

1.69 

2279 

2281 

2284 

2287 

2289 

2292 

2294 

2297 

2299 

2302 

2304 

1.70 

0.2304 

2307 

2310 

2312 

2315 

2317 

2320 

2322 

2325 

2327 

2330 

1. /I 

2330 

2333 

2335 

2338 

2340 

2343 

2345 

2348 

2350 

2353 

2355 

1.72 

2355 

2358 

2360 

2363 

2365 

2368 

2370 

2373 

2375 

2378 

2380 

1.73 

2380 

2383 

2385 

2388 

2390 

2393 

2395 

2398 

2400 

2403 

2405 

1.74 

2405 

2408 

2410 

2413 

2415 

2418 

2420 

2423 

2425 

2428 

2430 

1.75 

1 2430 

2433 

2435 

2438 

2440 

2443 

2445 

2448 

2450 

2453 

2455 

1.76 

2455 

2458 

2460 

2463 

2465 

2467 

2470 

2472 

2475 

2477 

2480 

1.77 

2480 

2482 

2485 

2487 

2490 

2492 

! 2494 

2497 

2499 

2502 

2504 

1.78 

2504 

2507 

2509 

2512 

2514 

[ 2516 

2519 

2521 

2524 

2526 

2529 

1.79 

2529 

2531 

2533 

2536 

2538 

2541 

2543 

2545 

2548 

2550 

2553 

1.80 

0.2553 

2555 

2558 

2560 

2562 

2565 

2567 

2570 

2572 

2574 

2577 

1.81 

2577 

2579 

2582 

2584 

2586 

2589 

2591 

2594 

2596 

2598 

2601 

1.82 

2601 

2603 

2605 

2608 

2610 

2613 

2615 

2617 

2620 

2622 

2625 

1.83 

■ 2625 

2627 

2629 

2632 

2634 

2636 

2639 

2641 

2643 

2646 

2648 

1.84 

2648 

2651 

2653 

2655 

2658 

2660 

2662 

2665 

2667 

2669 

2672 

1.85 

2672 

2674 

2676 

2679 

2681 

2683 

2686 

2688 

2690 

2693 

2695 

1.86 

2695 

2697 

2700 

2702 

2704 

2707 

2709 

2711 

2714 

2716 

2718 

1.87 

2718 

2721 

2723 

2725 

2728 

2730 

2732 

2735 

2737 

2739 

2742 

1.88 

2742 

2744 

2746 

2749 

2751 

2753 

2755 

2758 

2760 

2762 

2765 

1.89 

2765 

2767 

2769 

2772 

2774 

2776 

2778 

2781 

2783 

2785 

2788 

1.90 

0.2788 

2790 

2792 

2794 

2797 

2799 

2801 

2804 

2806 

2808 

2810 

1.91 

2810 

2813 

2815 

2817 

2819 

2822 

2824 

2826 

2828 

2831 

2833 

1.92 

2833 

2835 

2838 

2840 

2842 

2844 

2847 

2849 

2851 

2853 

2856 

1.93 

2856 

2858 

2860 

2862 

2865 

2867 

2869 

2871 

2874 

2876 

2878 

1.94 

2878 

2880 

2882 

2885 

2887 

2889 

2891 

2894 

2896 

2898 

2900 

1.95 

2900 

2903 

2905 

2907 

2909 

2911 

2914 

2916 

2918 

2920 

2923 

1.96 

2923 

2925 

2927 

2929 

2931 

2934 

2936 

2938 

2940 

2942 

2945 

1.97 

2945 

2947 

2949 

2951 

2953 

2956 

2958 

2960 

2962 

2964 

2967 

1.98 

2967 

2969 

2971 

2973 

2975 

2978 

2980 

2982, 

2984 

2986 

2989 

1.99 

2989 

2901 

2993 

2995 

2997 

2999 

3002 

3004 ; 

3006 

3008 ; 

3010 
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Table II. —Squares of Numbers 1 


If 

0 

l 

2 

3 

4 

5 

6 

7 

8 

9 

100 

10000 

10201 

10404 

10609 

10816 

11025 

11236 

11449 

11664 

11881 

110 

12100 

12321 

12544 

12769 

12996 

13225 

13456 

13689 

13924 

14161 

120 

14400 

14641 

14884 

15129 

15376 

15625 

15876 

16129 

16384 

16641 

130 

16900 

17161 

17424 

17689 

17956 

18225 

18496 

18769 

19044 

19321 

140 

19600 

19881 

20164 

20449 

20736 

21025 

21316 

21609 

21904 

22201 

150 

22500 

22801 

23104 

23409 

23716 

24025 

24336 

24649 

24964 

25281 

160 

25600 

25921 

26244 

26569 

26896 

27225 

27556 

27889 

28224 

28561 

170 

28900 

29241 

29584 

29929 

30276 

30625 

30976 

31329 

31684 

32041 

180 

32400 

32761 

33124 

33489 

33856 

34225 

34596 

34969 

35344 

35721 

190 

36100 

36481 

36864 

37249 

37636 

38025 

38416 

38809 

39204 

39601 

200 

40000 

40401 

40804 

41209 

41616 

42025 

42436 

42849 

43264 

43681 

210 

44100 

44521 

44944 

45369 

45796 

46225 

46656 

47089 

47524 

47961 

220 

48400 

48841 

49284 

49729 

50176 

50625 

51076 

51529 

51984 

52441 

230 

52900 

53361 

53824 

54289 

54756 

55225 

55696 

56169 

56644 

; 57121 

240 

57600 

58081 

58564 

| 

59049 

59536 

60025 

60516 

61009 

61504 

62001 

250 

62500 

63001 

63504 

64009 

64516 

65025 

' 65536 

i 

; 66049 

' 66564 

67081 

260 

67600 

68121 

68644 

69169 

69696 

70225 

70756 

71289 

71824 

72361 

270 

72900 

73441 

73984 

74529 

75076 

75625 

76176 

76729 

77284 

77841 

280 

78400 

78961 

79524 

80089 

80656 

81225 

81796 

82369 

82944 

83521 

290 

84100 

84681 

85262 

85849 

86436 

87025 

87616 

88209 

88804 

89401 

300 

90000 

90601 

91204 

91809 

92416 

93025 

93636 

942491 

94864 

95481 

310 

96100 

96721 

97344 

97969 

98596 

99225 

99856 

100489 

101124 

101761 

320 

102400 

103041 

103684 

104329 

104976 

105625 

106276 

106929! 

107584 

108241 

330 

108900 

109561 

110224 

110889 

111556 

112225 

112896 

113569 

114244 

114921 

340 

115600 

116281 

116964 

117649 

118336 

119025 

119716 

120409 

121104 

121801 

350 

122500 

123201 

123904 

124609 

125316 

126025 

126736 

127449 

128164 

128881 

360 

129600 

130321 

131044 

131769 

132496 

133225 

133956 

134689 

135424 

136161 

370 

136900 

137641 

138384 

139129 

139876 

140625 

141376 

142129 

142884 

143641 

380 

144400 

145161 

145924 

146689 

147456 

148225 

148996 

149769 

150544 

151321 

390 ; 

152100 

152881 

153664 

154449 

155236 

156025 

156816 

157609 

158404 

159201 

400 

160000 

160801 

161604 

162409 

163216 

164025 

164836 

165649 

166464 

167281 

410 

168100 

168921 

169744 

170569 

171396 

172225 

173056 

173889 

174724 

175561 

420 

176400 

177241 

178084 

178929 

179776 

180625 

181476 

182329 

183184 

184041 

430 

184900 

185761 

186624 

187489 

188356 

189225 

190096 

190969 

191844 

192721 

440 

193600 

194481 

195364 

196249 

197136 

198025 

198916 

199809 

200704 

201610 

450 

202500 

203401 

204304 

205209 

206116 

207025 

207936 

208849 

209764 

210681 

460 

211600 

212521 

213444 

214369 

215296 

216225 

217156 

218089 

219024 

219961 

470 

220900 

221841 

222784 

223729 

224676 

225625 

226576 

227529 

228484 

229441 

480 

230400 

231361 

232324 

233289 

234256 

235225 

236196 

237169 

238144 

239121 

490 

240100 

241081 

242064 

243049 

244.036 

245025 

246016 

247009 

248004 

249001 

500 

250000 

251001 

252004 

253009 

254016 

255025 

256036 

257049 

258064 

259081 

510 

260100 

261121 

262144 

263169 

264196 

265225 

266256 

267289 

268324 

269361 

520 

270400 

271441 

272484 

273529 

274576 

275625 

276676 

277729 

278784 

279841 

530 

280900 

281961 

283024 

284089 

285156 

286225 

287296 

288369 

289444 

290521 

540 

291600 

i 292681 

293764 

294849 

295936 

297025 

298116 

299209 

300304 

301401 


i Source: Wattoh, Abbetit E., Laboratory Manual and Problems for Elements of Statistical 
Method (McGraw-Hill Book Company, Inc., 1944). 
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Table II. —Squares op Numbers.— ( Continued ) 


N 

0 

1 

2 

3 

4 

5 

6 

H 


9 

550 

30250( 

) 30360] 

. 304704 

l 305801 

) 306911 

! 308021 

309136 

31024E 

> 311364 

t 312481 

560 

31360C 

) 31472] 

. 315844 

l 316961 

) 318091 

> 319221 

320356 

32148E 

) 322624 

[ 323761 

570 

32490( 

) 326041 

. 327184 

l 328321 

> 329471 

i 330621 

331776 

33292E 

1 334084 

: 335241 

580 

33640C 

) 337561 

. 338724 

: 339881 

i 341056 

i 342221 

343396 

344569 

l 345744 

: 346921 

690 

34810C 

) 349281 

350464 

: 351641 

i 352836 

1 354025 

355216 

356409 

t 357604 

; 358801 

600 

36000C 

1 361201 

362404 

363609 

' 364816 

366025 

367236 

i 368449 

369664 

370881 

610 

372100 

i 373321 

374544 

375769 

' 376996 

378225 

379456 

■ 380689 

381924 

383161 

620 

384400 

i 385641 

386884 

388129 

389376 

390625 

391876 

393129 

394384 

395641 

630 

396900 

| 398161 

399424 

400689 

401956 

403225 

404496 

405769 

407044 

408321 

640 

409600 

410881 

412164 

413449 

414736 

416025 

417316 

418609 

419904 

421201 

650 

422500 

423801 

425104 

426409 

427716 

429025 

430336 

431649 

432964 

434281 

660 

435600 

436921 

438244 

439569 

440896 

442225 

443556 

444889 

446224 

447561 

670 

448900 

450241 

451584 

452929 

454276 

455625 

456976 

458329 

459684 

461041 

680 

462400 

463761 

465124 

466489 

467856 

469225 

470596 

471969 

473344 

474721 

690 

476100 

477481 

478864 

480249 

481636 

483025 

484416 

485809 

487204 

48*601 

700 

490000 

491401 

492804 

494209 

495616 

497025 

498436 

498849 

501264 

502681 

710 

504100 

505521 

506944 

508369 

509796 

511225 

512656 

514089 

515524 

516961 

720 

518400 

519841 

521284 

522729 

524176 

525625, 

527076 

528529 

529984 

531441 

730 

532900 

534361 

535824 

537289 

538756 

540225 

541696 

543169 

544644 

546121 

740 

547600 

549081 

550564 

552049 

553536 

555025 

556516 

558009 

559504 

561001 

750 

562500 

564001 

565504 

567009 

568516 

570025 

571536 

573049 

574564 

576081 

760 

577600 

579121 

580644 

582169 

583696 

1 585225 

586756 

588289 

589824 

591361 

770 

592900 

594441 

595984 

597529 

: 599076 

600625 

602176 

603729 

605284 

606841 

780 

608400 

609961 

611524 

613089 

614656 

616225 

617796 

619369 

620944 

622521 

790 

624100 

625681 

627264 

628849 

630436 

632025 

633616 

635209 

636804 

638401 

800 

640000 

641601 

643204 

644809 

646416 

648025 

649636 

651249 

652864 

654481 

810 

656100 

657721 

659344 

660969 

662596 

664225 

665856 

667489 

669124 

670761 

820 

672400 

674041 

675684 

677329 

678976 

680625 

682276 

683929 

685584 

687241 

830 

688900 

690561 

692224 

693889 

695556 

697225 

698896 

700569 

702244 

703921 

840 

705600 

707281 

708964 

710649 

712336 

714025 

715716 

717409 

719104 

720801 

850 

722500 

724201 

725904 

727609 

729316 

731025 

732736 

734449 

736164 

737881 

860 

739600 

741321 

743044 

744769 

746496 

748225 

749956 

751689 

753424 

755161 

870 

756900 

758641 

760384 

762129 

763876 

765625 

767376 

769129 

770884 

772641 

880 

774400 

776161 

777924 

779689 

781456 

783225 

784996 

786769 

788544 

790321 

890 

792100 

793881 

795664 

797449 

799236 

801025 

802816 

804609 

806404 

808201 

900 

810000 

811801 

813604 

815409 

817216 

819025 

820836 

822649 

824464 

826281 

910 

828100 

829921 

831744 

833569 

835396 

837225 

839056 

840889 

842724 

844561 

920 

846400 

848241 

850084 

851929 

853776 

855625 

857476 

859329 

861184 

863041 

930 

864900 

866761 

868624 

870489 

872356 

874225 

876096 

877969 

879844 

881721 

940 

883600 

885481 

887364 

889249 

891136 

893025 

894916 

896809 

898704 

900601 

950 

902500 

904401 

906304 

908209 

910116 

912025 

913936 

915849 

917764 

919681 

960 

921600 

923521 

925444 

927369 

929296 

931225 

933156 

935089 

937024 

938961 

970 

940900 

942841 

944784 

946729 

948676 

950625 

952576 

954529 

956484 

958441 

980 

960400 

962361 

964324 

966289 

968256 

970225 

972196 

974169 

976144 

978121 

990 

980100 

982081 

984064 

986049 

988036 

990025 

992016 

994009 

996004 

998001 
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Table III. Square Roots of Numbers from 10 to 100. —( Continued ) 
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Table IV. —Square Roots of Numbers from 100 to 1000 1 


N 

0 

■ 


3 

4 

5 

6 

7 

8 

—s*. 

9 

100 

10.00 

10.05 

10.10 

10.15 

10.20 

10.25 

10.30 

10.34 

10.39 

10.44 

110 

10.49 

10.54 

10.58 

10.63 

10.68 

10.72 

10.77 

10.82 

10.86 

10.91 

120 

10.95 

11.00 

11.05 

11.09 

11.14 

11.18 

11.22 

11.27 

11.31 

11.36 

130 

11.40 

11.45 

11.49 

11.53 

11.58 

11.62 

11.66 

11.70 

11.75 

11.79 

140 

11.83 

11.87 

11.92 

11.96 

12.00 

12.04 

12.08 

12.12 

12.17 

12.21 

150 

12.25 

12.29 

12.33 

12.37 

12.41 

12.45 

12.49 

12.53 

12.57 

12.61 

160 

12.65 

12.69 

12.73 

12.77 

12.81 

12.85 

12.88 

12.92 

12.96 

13.00 

170 

13.04 

13.08 

13.11 

13.15 

13.19 

13.23 

13.27 

13.30 

13.34 

13.38 

180 

13.42 

13.45 

13.49 

13.53 

13.56 

13.60 

13.64 

13.67 

13.71 

13.75 

190 

13.78 

13.82 

13.86 

13.89 

13.93 

13.96 

14.00 

14.04 

14.07 

14.11 

200 

14.14 

14.18 

14.21 

14.25 

14.28 

14.32 

14.35 

14.39 

14.42 

14.46 

210 

14.49 

14.53 

14.56 

14.59 

14.63 

14.66 

14.70 

14.73 

14.76 

14.80 

220 

14.83 

14.87 

14.90 

14.93 

14.97 

15.00 

15.03 

15.07 

15.10 

15.13 

230 

15.17 

15 r 20 

15.23 

15.26 

15.30 

15.33 

15.36 

15.39 

15.43 

15.46 

240 

15.49 

15.52 

15.56 

15.59 

15.62 

15.65 

15.68 

15.72 

15.75 

15.78 

; 250 

15.81 

15.84 

15.87 

15.91 

15.94 

15.97 

16.00 

16.03 

16.06 

16.09 

: 260 

16.12 

16.16 

16.19 

16.22 

16.25 

16.28 

16.31 

16.34 

16.37 

16.40 

270 

16.43 

16.46 

16.49 

16.52 

16.55 

16.58 

16.61 

16.64 

16.67 

16.70 

1 280 

16.73 

16.76 

16.79 

16.82 , 

16.85 

16.88 

16.91 

16.94 

16.97 

17.00 

290 

17.03 

17.06 

17.09 

17.12 

17.15 

17.18 

17.20 

17.23 

17.26 

17.29 

; 300 

17.32 

17.35 

17.38 

17.41 

17.44 

17.46 

17.49 

17.52 

17.55 

17.58 

■ 310 

17.61 

17., 64 

17.66 

17.69 

17.72 

17.75 

17.78 

17.80 

17.83 

17.86 

320 

17.89 

17.92 

17.94 

17.97 

18.00 

18.03 , 

18.06 

18.08 

18.11 

18.14 

330 

18.17 

18.19 

18.22 

18.25 

18.28 

18.30 

18.33 

18.36 

18.38 

18.41 

340 

18.44 

18.47 

18.49 

18.52 

18.55 

18.57 

18.60 

18.63 

18.65 

18.68 

350 

18.71 

18.74 

18.76 

18.79 

18.81 

i 18.84 

18.87 

18.89 

18.92 

18.95 

360 

18.97 

19.00 

19.03 

19.05 

19.08 

19.10 

19.13 

19.16 

19.18 

19.21 

370 

19.24 

19.26 

19.29 

19.31 

19.34 

19.36 

19.39 

19.42 

1 19.44 

19.47 

380 

19.49 

19.52 

19.54 

19.57 

19.60 

19.62 

19.65 

19.67 

19.70 

19.72 

390 

19.75 

19.77 

19.80 

19.82 

19.85 

19.87 

19.90 

19.92 

19.95 

19.98 

400 

20.00 

20.02 

20.05 

20.07 

20.10 

20.12 

20.15 

20.17 

20.20 

20.22 

410 

20.25 

20.27 

20.30 

20.32 

20.35 

20.37 

20.40 

20.42 

20.44 

20.47 

420 

20.49 

20.52 

20.54 

20.57 

20.59 

20.62 

20.64 

20.66 

20.69 

20.71 

430 

20.74 

20.76 

20.78 

20.81 

20.83 

20.86 

20.88 

20.90 

20.93 

20.95 

440 

20.98 

21.00 

21.02 

21.05 

21.07 

21.10 

21.12 

21.14 

21.17 

21.19 

4.50 

21.21 

21.24 

21.26 

21.28 

21.31 

21.33 

21.35 

21.38 

21.40 

21.42 

460 

21.45 

21.47 

21.49 

21.52 

21.54 

21.56 

21.59 

21.61 

21.63 

21.66 

470 

21.68 

21.70 

21.73 

21.75 

21.77 

21.79 

21.82 

21.84 

21.86 

21.89 

480 

21.91 

21.93 

21.95 

21.98 

22.00 

22.02 

22.05 

22.07 

22.09 

22.11 

490 

22.14 

22.16 

22.18 

22.20 

22.23 

22.25 

22.27 

22.29 

22.32 

22.34 

500 

22.36 

22.38 

22.41 

22.43 

22.45 

22.47 

22.49 

22.52 

22.54 

22.56 

510 

22.58 

22.01 

22.63 

22.65 

22.67 

22.69 

22.72 

22.74 

22.76 

22.78 

520 

22.80 

22.83 

22.85 

22.87 

22.89 

22.91 

22.93 

22.96 

22.98 

23.00 

530 

23.02 

23.04 

23.07 

23.09 

23.11 

23.13 

23.15 

23.17 

23.19 

23.22 

540 

23.24 

23.26 

23.28 

23.30 

23.32 

23.35 

23.37 

23.39 

23.41 

23.43 

550 

23.45 

23.47 

23.49 

23,52 

23.54 

23.56 

23.58 

23.60 

23.62 

23.64 


1 Source; Waugh, Amjeht E., Laboratory Manual and Problems for Elements of Statistical 
Method (McGraw-Hill Book Company, Inc., 1944). 
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Table IV— Square Roots of Numbers from 100 to 1000— (Continued) 



550 23.45 
560 23.66 
570 23.87 
580 24.08 
590 24.29 

600 24.49 
610 24.70 
620 24.90 
630 25.10 
640 25.30 



700 26.46 
710 26.65 
720 26.83 
730 27.02 
740 27.20 



28.35 28.37 
28.53 28.55 
28.71 28.72 
28.88 28.90 
29.05 29.07 



29.29 29.31 

29.46 29.48 


900 30.00 30.02 

910 30.17 30.18 

920 30.33 30.35 

930 30.50 30.51 

940 30.66 30.68 

950 30.82 30.84 

960 30.98 31.00 

970 31.14 31.16 

980 31.30 31.32 

990 31.46 31.48 


30.89 30.90 
31.05 31.06 
31.21 31.22 
31.37 31.38 
31.53 31.54 
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Table V. —Reciprocals of Numbers 1 



.09 


.9174 

.8403 

.7752 

.7194 

.6711 

.6289 

.5917 

.5587 

.5291 

.5025 

.4785 

.4566 

.4367 

.4184 

.4016 

.3861 

.3717 

.3584 

.3460 

.3344 

.3236 

.3135 

.3040 

.2950 

.2865 

.2786 

.2710 

.2639 

.2571 

.2506 

.2445 

.2387 

.2331 

.2278 

.2227 

.2179 

.2132 

.2088 

.2045 

.2004 

.1965 

.1927 

.1890 

.1855 

.1821 


Source: Waugh, Albert E., Laboratory Manual and Problems for Elements of Statistical 
Method (McGraw-Hill Book Company, Inc., 1944), 
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Table Y.—Heciprocals of Numbers.—(. Continued ) 


N 

.00 

.01 


.03 

.04 

.05 

.06 

.07 

.08 

.09 

H 

m 

.1815 

MR 

.1808 

.1805 

.1802 

.1799 

.1795 

.1792 

.1789 

Urn 

■Hi 

.1783 


.1776 

.1773 

.1770 

.1767 

.1764 

.1761 

. 1757 

5.70 

.1754 

.1751 

.1748 

.1745 

.1742 

.1739 

.1736 

.1733 

.1730 

.1727 

5.80 

.1724 

.1721 

.1718 

.1715 

.1712 

.1709 

.1706 

.1704 

.1701 

.1698 

5.90 

.1695 

.1692 

.1689 

.1686 

.1684 

.1681 

.1678 

.1675 

.1672 

.1669 

6.00 

.1667 

.1664 

.1661 

.1658 

.1656 

.1653 

.1650 

.1647 

.1645 

.1642 

6.10 

.1639 

.1637 

.1634 

.1631 

.1629 

.1626 

.1623 

.1621 

.1618 

.1616 

6.20 

.1613 

.1610 

.1608 

.1605 

.1603 

.1600 

.1597 

.1595 

.1592 

.1590 

6.30 

.1587 

.1585 

.1582 

.1580 

.1577 

.1575 

.1572 

.1570 

.1567 

.1565 

6.40 

.1562 

.1560 

.1558 

.1555 

.1553 

.1550 

.1548 

.1546 

.1543 

.1541 

6.50 

.1538 

.1536 

.1534 

.1531 

.1529 

.1527 

.1524 

.1522 

.1520 

.1517 

6.60 

.1515 

.1513 

.1511 

.1508 

.1506 

.1504 

.1502 

.1499 

.1497 

.1495 

6.70 

.1493 

.1490 

.1488 

.1486 

.1484 

.1481 

.1479 

.1477 

.1475 

.1473 

6.80 

.1471 

.1468 

,1466 

.1464 

.1462 

.1460 

.1458 

.1456 

.1453 

.1451 

6.90 

.1449 

.1447 

.1445 

.1443 

.1441 

.1439 

.1437 

.1435 

.1433 

.1431 

7.00 

.1429 

.1427 

.1424 

.1422 

.1420 

.1418 

.1416 

.1414 

.1412 

.1410 

7.10 

.1408 

.1406 

.1404 

.1403 

.1401 

.1399 

.1397 

.1395 

.1393 

.1391 

7.20 

.1389 

.1387 

.1385 

.1383 

.1381 

.1379 

.1377 

.1376 

.1374 

.1372 

7.30 

.1370 

.1368 

.1366 

.1364 

.1362 

.1361 

.1359 

.1357 

.1355 

.1353 

7.40 

.1351 

.1350 

.1348 

.1346 

.1344 

.1342 

.1340 

.1339 

.1337 

.1335 

7.50 

.1333 

.1332 

.1330 

.1328 

.1326 

.1324 

.1323 

.1321 

.1319 

.1318 

7.60 

.1316 

.1314 

.1312 

.1311 

.1309 

.1307 

•.1305 

.1304 

.1302 

.1300 

7.70 

.1299 

.1297 

.1295 

.1294 

.1292 

.1290 

.1289 

.1287 

.1285 

.1284 

7.80 

.1282 

.1280 

.1279 

.1277 

.1276 

.1274 

.1272 

.1271 

.1269 

.1267 

7.90 

.1266 

.1264 

.1263 

.1261 

.1259 

.1258 

.1256 

.1255 

.1253 

.1252 

8.00 

.1250 

.1248 

.1247 

.1245 

.1244 

.1242 

.1241 

.1239 

.1238 

.1236 

8.10 

.1235 

.1233 

.1232 

.1230 

.1228 

.1227 

.1225 

.1224 

.1222 

.1221 

8.20 

.1220 

.1218 

.1217 

.1215 

.1214 

.1212 

.1211 

.1209 

.1208 

.1206 

8.30 

.1205 

.1203 

.1202 

.1200 

.1199 

.1198 

.1196 

.1195 

.1193 

.1192 

8.40 

.1190 

.1189 

.1188 

.1186 

.1185 

.1183 

.1182 

.1181 

.1179 

.1178 

8.50 

.1176 

.1175 

.1174 

.1172 

.1171 

.1170 

.1168 

.1167 

.1166 

.1164 

8.60 

.1163 

.1161 

.1160 

.1159 

.1157 

.1156 

.1155 

.1153 

.1152 

.1151 

8.70 

.1149 

.1148 

.1147 

.1145 

.1144 

.1143 

.1142 

.1140 

.1139 

.1138 

8.80 

.1136 

.1135 

.1134 

.1132 

.1131 

.1130 

.1129 

.1127 

.1126 

.1125 

8.90 

.1124 

.1122 

.1121 

.1120 

.1119 

.1117 

.1116 

.1115 

.1114 

.1112 

9.00 

i .1111 

.1110 

.1109 

.1107 

.1106 

.1105 

.1104 

.1103 

.1101 

.1100 

9.10 

i • .1099 

.1098 

.1096 

.1095 

.1094 

.1093 

.1092 

.1091 

.1089 

.1088 

9.2C 

i . 1087 

.1086 

.1085 

.1083 

.1082 

.1081 

.1080 

.1079 

.1078 

.1076 

9.30 

l .1075 

.1074 

.1073 

.1072 

.1071 

.1070 

.1068 

.1067 

.1066 

.1065 

9.40 

> .1064 

.1063 

.1062 

.1060 

.1059 

.1058 

.1057 

.1056 

.1055 

.1054 

9.5C 

) .1053 

.1052 

.1050 

.1049 

.1048 

.1047 

.1046 

.1045 

.1044 

.1043 

9.6C 

) .1042 

.1041 

.1040 

.1038 

.1037 

.1036 

.1035 

.1034 

.1033 

.1032 

9.7( 

) .1031 

.1030 

.1029 

.1028 

.1027 

.1026 

.1025 

.1024 

.1022 

.1021 

9.8( 

) .1020 

.1019 

.1018 

.1017 

.1016 

.1015 

.1014 

.1013 

.1012 

.1011 

9.9( 

) .1010 

.1009 

.1008 

.1007 

.1006 

.1005 

.1004 

.1003 

.1002 

.1001 













1 Th^y 1 '- A * r 5 as ’. 0rdinates > and Derivatives or the Normal Curve! 


(3) (4) (5) 

Ordi- / x \ * / x \ * 

nate <?* {$) 

• 3989 .0000 1. 1968 

.3989 .0120 1.1965 

•3989 .0239 1.L956 

.3988 .0359 1.L941 

.3986 .0478 1. L920 


"(a)*-(s) 


0199 

.3984 

.0597 

1. L894 

. 55 

.2088 

3499 

0239 

.3982 

.0716 

1.1861 

.56 

.2123 

■3411 

0279 

.3980 

.0834 

1.1822 

.57 

.2157 

3391 

0319 

.3977 

.0952 

1.1778 

.58 

.2190 

.3372 

0359 

.3973 

.1070 

1.1727 

.59 

.2224 

.3352 

0398 

.3970 

.1187 

1.1071 

.60 

.2258 

.3332 

0438 

. 3965 

. 1303 

1.1609 

.61 

.2291 

.3312 

0478 

.3961 

.1419 

1.1541 

.62 

.2324 

. 3292 

0517 

.3956 

. 1534 

1.1468 

.63 

. 2357 

.3271 

0557 

.3951 

.1648 

1.1389 

.64 

.2389 

.3251 


■ 3521 .4841 .5501 
.3503 .4895 .5279 
.3485 .4947 .',50^6 
.3467 .4996 .4831 
.3448 .5043 .4605 


• 5088 .4378 
•5131 .4150 
•5171 .3921 
•5209 .3691 
. 5245 .3461 


.65 .2422 
.66 .2454 
.67 .2486 
.68 .2518 
.69 .2549 

.70 .2580 
.71 .2612 
. 72 .2642 
• 73 .2673 
.74 .2704 


.0987 
.1026 
. 1064 
.1103 
.1141 

.3867 
.3857 
. 3847 
.3836 
.3825 

.2840 
.2941 
.3040 
.3138 
. 3235 

1.0165 

1.0024 
0.9878 
0.9727 
0.9572 ' 

.75 

.76 

.77 

.78 

.79 

.2734 

.2764 

.2794 

.2823 

.2852 

.3011 

.2989 

.2966 

.2943 

.2920 

.5505 

.5502 

.5497 

.5490 

.5481 

— .0176 

— .0394 

— .0611 

— .0825 

— .1037 

.1179 
.1217 
. 1255 
. 1293 
.1331 

.3814 

.3802 

.3790 

.3778 

.3765 

.3330 

.3423 

.3515 

.3605 

.3693 

0.9413 

0.9250 

0.9082 

0.8910 

0.8735 

.80 

.81 

.82 

.83 

.84 

.2881 

.2910 

.2939 

.2967 

.2996 

.2897 
.2874 
.2850 
.2827 
. 2803 

.5469 

.5456 

.5440 

.5423 

.5403 

— .1247 

— .1455 

— .1660 
— .1862 
— .2063 

.1368 
. 1406 
. 1443 
.1480 
.1517 

.3752 

.3739 

.3726 

.3712 

.3697 

.3779 

.3864 

.3947 

.4028 

.4107 

0.8556 

0.8373 

0.8186 

0.7996 

0.7803 

.85 

.86 

.87 

.88 

.89 

.3023 

.3051 

.3079 

.3106 

.3133 

.2780 

.2756 

.2732 

.2709 

.2685 

.5381 
.5358 
.5332 
. 5305 
.5276 

— .2260 

— .2455 

— .2646 

— .2835 

— .3021 

.1554 

.1591 

.1628 

.1664 

.1700 

.3683 

.3668 

.3653 

.3637 

.3621 

.4184 

.4259 

.4332 

.4403 

.4472 

0.7607 

0.7408 

0.7206 

0.7001 

0.6793 

.90 

.91 

.92 

.93 

.94 

.3159 

.3186 

.3212 

.3238 

.3264 

.2661 

.2637 

.2613 

.2589 

.2565 

.5245 

.5212 

.5177 

.5140 

.5102 

—.3203 

— .3383 

— .3559 

— .3731 

— .3901 

. 1736 
.1772 
.1808 
.1844 
.1879 , 

.3605 

.3589 

.3572 

.3555 

.3538 

.4539 
.4603 
. 4666 
.4727 
.4785 

0.6583 

0.6371 

0.6156 

0.5940 

0.5721 

.95 

.96 

.97 

.98 

.99 

.3289 
.3315 
.3340 
. 3365 
.3389 

.2541 

.2516 

.2492 

.2468 

.2444 

.5062 

.5021 

.4978 

.4933 

.4887 

— .4066 

— .4228 

— .4387 

— .4541 

— .4692 

.1915 

.3521 

.4841 

0.5501 

1.00 

.3413 

.2420 

.4839 

—.4839 


o/ Chemi ^ H 

* If the ordinate shown in column (3) is designated as (j), then (|) i 8 deflned 
as *. (I) multiplied by - g) and *. (S) is defined as *„ (2) multiplied by 

(3 - + 2 .^ 

V d* d‘J 

By successive differentiation of ( j) it can be shown that v , (2) is also the third deriva¬ 
tive of „ ( 2 ) and v. (2) is the fourth derivative of va (j) (see Chap. VII. pp. 142-145). 
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Table VI— Areas, Ordinates, and Derivatives of the Normal 

( Continued ) 


3413 .2420 


.3485 .2347 
.3508 .2323 

.3531 .2299 
.3554 .2275 
.3577 .2251 
.3599 .2227 


(4) 

(5) 

(1) 

(2) 

(3) 



X 


Ordi- 

(a) 

(0 

d 

Area 

nate 

.4839 

— .4839 

1.50 

.4332 

.1295 

.4790 

— .4983 

1.51 

.4345 

. 1276 

.4740 

— .5122 

1.52 

.4357 

.1257 

.4688 

— .5257 

1.53 

.4370 

.1238 

.4635 

— .5389 

1.54 

.4382 

.1219 

.4580 

— .5516 

1.55 

.4394 

.1200 

.4524 

—.5639 

1.56 

.4406 

.1182 

.4467 

— .5758 

1.57 

.4418 

.1163 

.4409 

—.5873 

1.58 

.4430 

.1145 

.4350 

— .5984 

1.59 

.4441 

.1127 

.4290 

— .6091 

1.60 

.4452 

.1109 

.4228 

— .6193 

1.61 

.4463 

. 1092 

.4166 

— .6292 

1.62 

.4474 

.1074 

.4102 

— .6386 

1.63 

.4485 

. 1057 

.4038 

— .6476 

1.64 

.4495 

.1040 


.1457 —.7043 
.1387 —.6994 
.1317 —.6942 
.1248 —.6888 
.1180 —.6831 

.1111 —.6772 
.1044 —.6710 
.0977 —.6646 
.0911 —.6580 
.0846 —.6511 


.4032 .1714 .2918 
.4049 .1692 .2845 
.4066 .1669 .2771 
.4082 .1647 .2697 
.4099 .1626 .2624 


-.0341 —.4692 
-.0388 —.4593 
-.0433 —.4494 
-.0477 —.4395 
-.0521 —.4295 


—.7243 

— .7209 

— .7172 

— .7132 

— .7089 


— .0761 —.3693 

— .0797 —.3592 

— .0832 —.3492 

— .0867 —.3392 
—.0900 —.3292 

— .0933 —.3192 

— .0964 —.3093 

— .0994 —.2994 

— .1024 —.2895 

— .1052 —.2797 


1 Reproduced by permission from Mathematical Tables from Handbook of Chemistry and 
Physics compiled by Charles D. Hodgman, 7th ed., 1941, pp. 200-204. 

* If the ordinate shown in column (3) is designated as <po , then <pz is defined 
as <po multiplied by and <^4 is defined as ipo multiplied by 

O-f + aO- 

By successive differentiation of <pa it can be shown that <ps i s a i s0 the third deriva¬ 

tive of <po and <pi QQ is the fourth derivative of <po (see Chap. VII, pp. 142-145). 
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(1) 

(2) 

(3) 

| (4) 

(5) 

(1) 

(2) 

(3) 

(4) 

(5) 

X 

d 

Area 

Ordi¬ 

nate 

- (a)* 

-(a)* 

x 

6 

Area 

Ordi¬ 

nate 

-(a)* 

-(a)* 

2.00 

.4773 

.0540 

— .1080 

— .2700 

2.50 

.4938 

.0175 

—.1424 

.0800 

2.01 

.4778 

.0529 

— .1106 

— .2603 

2.51 

.4940 

.0171 

— .1416 

.0836 

2.02 

.4783 

.0519 

— .1132 

— .2506 

2.52 

.4941 

.0167 

—.1408 

.0871 

2.03 

.4788 

.0508 

— .1157 

— .2411 

2.53 

.4943 

.0163 

—. 1399 

.0905 

2.04 

. 4793 

. 0498 

— .1180 

— .2316 

2.54 

.4945 

.0159 

—.1389 

.0937 

2.05 

.4798 

.0488 

—. 1203 

— .2222 

2.55 

.4946 

.0155 

— .1380 

. 0968 

2.06 

.4803 

.0478 

— .1225 

— .2129 

2.56 

.4948 

.0151 

—.1370 

. 0998 

2.07 

.4808 

.0468 

— .1245 

— .2036 

2.57 

.4949 

.0147 

—.1360 

. 1027 

2.08 

.4812 

.0459 

— . 1265 

— .1945 

2.58 

.4951 

.0143 

— .1350 

.1054 

2.09 

.4817 

.0449 

— .1284 

— .1854 

2.59 

.4952 

.0139 

— .1339 

. 1080 

2.10 

.4821 

.0440 

— .1302 

—.1765 

2.60 

.4953 

.0136 

— .1328 

.1105 

2.11 

.4826 

.0431 

—.132C 

— .1676 

2.61 

.4955 

.0132 

— .1317 

. 1129 

2.12 

. 4830 

.0422 

— .1336 

— . 1588 

2.62 

.4956 

.0129 

— .1305 

. 1152 

2.13 

.4834 

.0413 

— . 1351 

— .1502 

2.63 

.4957 

.0126 

— .1294 

.1173 

2.14 

.4838 

.0404 

— .1366 

— .1416 

2.64 

.4959 

.0122 

— .1282 

.1194 

2.15 

.4842 

.0396 

— .1380 

— .1332 

2.65 

.4960 

.0119 

— .1270 

. 1213 

2.16 

.4846 

.0387 

— .1393 

— .1249 

2.66 

.4961 

.0116 

— .1258 

.1231 

2.17 

.4850 

.0379 

—. 1405 

—. i167 

2.67 

.4962 

.0113 

— .1245 

. 1248 

2.18 

.4854 

.0371 

— .1416 

— .1086 

2.68 

.4963 

.0110 

—.1233 

- 1264 

2.19 

.4857 

.0363 

— .1426 

— .1006 

2.69 

.4964 

.0107 

— .1220 

. 1279 

2.20 

.4861 

.0355 

—.1436 

— .0927 

2.70 

.4965 

.0104 

— .1207 

. 1293 

2.21 

. 4865 

.0347 

— .1445 

— .0850 

2.71 

.4966 

.0101 

— .1194 

. 1306 

2.22 

.4868 

.0339 

— .1453 

— .0774 

2.72 

.4967 

.0099 

— .1181 

. 1317 

2.23 

.4871 

.0332 

— .1460 

— .0700 

2.73 

.4968 

.0096 

— .1168 

. 1328 

2.24 

. 4875 

.0325 

— .1467 

— .0626 

2.74 

.4969 

.0094 

— .1154 

. 1338 

2.25 

.4878 

.0317 

— .1473 

—.0554 

2.75 

.4970 

.0091 

— .1141 

.1347 

2.26 

.4881 

.0310 

— .1478 

— .0484 

2.76 

.4971 

.0089 

— .1127 

. 1356 

2.27 

.4884 

.0303 

—.1483 

— .0414 

2.77 

.4972 

.0086 

— .1114 

. 1363 

2.28 

.4887 

.0297 

— .1486 

— .0346 

2.78 

.4973 

.0084 

— .1100 

. 1369 

2.29 

.4890 

.0290 

—.1490 

—.0279 

2.79 

.4974 

.0081 

— .1087 

.1375 

2.30 

.4893 

.0283 

— .1492 

— .0214 

2.80 

.4974 

.0079 

— .1073 

.1379 

2.31 

.4896 

.0277 

— .1494 

—.0150 

2.81 

.4975 

.0077 

— .1059 

. 1383 

2.32 

.4898 

.0271 

— .1495 

— .0088 

2.82 

.4976 

.0075 

— .1045 

. 1386 

2.33 

.4901 

.0264 

— .1496 

— .0027 

2.83 

.4977 

.0073 

— .1031 

. 1389 

2.34 

.4904 

.0258 

—.1496 

.0033 

2.84 

.4977 

.0071 

— .1017 

. 1390 

2.35 

.4906 

.0252 

— .1495 

.0092 

2.85 

.4978 

.0069 

— .1003 

.1391 

2.36 

.4909 

.0246 

— .1494 

.0149 

2.86 

.4979 

.0067 

— .0990 

. 1391 

2.37 

.4911 

.0241 

— .1492 

.0204 

2.87 

.4980 

.0065 

— .0976 

.1391 

2.38 

.4913 

.0235 

— .1490 

.0258 

2.88 

.4980 

.0063 

— .0962 

.1389 

2.39 

.4916 

.0229 

— .1487 

.0311 

2.89 

.4981 

.0061 

— .0948 

.1388 

2.40 

.4918 

.0224 

— .1483 

.0362 

2.90 

.4981 

.0060 

— .0934 

. 1385 

2.41 

.4920 

.0219 

— .1480 

.0412 

2.91 

.4982 

.0058 

— .0920 

.1382 

2.42 

.4922 

.0213 

—. 1475 

.0461 

2.92 

.4983 

.0056 

— .0906 

.1378 

2.43 

.4925 

.0208 

—,1470 

.0508 

2.93 

.4983 

.0055 

— .0893 

.1374 

2.44 

.4927 

.0203 

—.1465 

.0554 

2.94 

.4984 

.0053 

— .0879 

.1369 

2.45 

.4929 

.0198 

— .1459 

.0598 

2.95 

.4984 

.0051 

—.0865 

.1364 

2.46 

.4931 

.0194 

— .1453 

.0641 

2.96 

.4985 

.0050 

—.0852 

.1358 

2.47 

.4932 

.0189 

— .1446 

.0683 

2.97 

.4985 

.0049 

— .0838 

. 1352 

2.48 

.4934 

.0184 

— .1439 

.0723 

2.98 

.4986 

.0047 

— .0825 

. 1345 

2.49 

.4936 

.0180 

—.1432 

.0762 

2.99 

.4986 

.0046 

—.0811 

.1337 

2.50 

.4938 

.0175 

—.1424 

.0800 

3.00 

.4987 

.0044 

— .0798 

.1330 


1 Reproduced by permission from Mathematical Tables from Handbook of Chemistry and 
Physics compiled by Charles D. Hodgman, 7th ed., 1941, pp. 200-204. 


* If the ordinate shown in column (3) is designated as <po 0^, then <p 3 0^ is defined 
as <po 0^ multiplied by and <pi 0^ is defined as <po 0^ multiplied by 

( 3 -^+a0- 


By successive differentiation of <po 0^ it can be shown that <ps 0^ is also the third deriva- 

■ivatr 
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tivp of <po ty and <pi 0 ^ is the fourth derivative of <po 0 ^ (see Chap. VII, pp.142-145). 
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Table VI 

.—Areas, Ordinates, 

, and Derivatives' of the Normal Curve 1 





(Conclu 

ded) 





(l) 

(2) 1 

(3) 

(4) .(. 

(5) 

(l) 

(2) 

(3) 

(4) 

(5) 

T, 


Ordi¬ 

/ x \* 

/z\ * 

X 


Ordi¬ 


/s\* 

B Ami 

nate 

w 

4 (d) 

d 

Area 

nate 

y 3 ^) (Pi 

W 

4.00 

5000 

7obo.r 

— .0070 

.0218 

4.50 

.5000 

.0000" 

— .0012 

0047 

4.01 

5000 

.0001 

— .0067 

.0212 

4.51 

.5000 

.0000 

— .0012 

0045 

4.02 

5000 

.0001 

— .0065 

.0207 

4.52 

.5000 

.0000 

—.0012 

0044 

4 03 

5000 

.0001 

— .0063 

.0201 

4.53 

. 5000 

. 0000 

— .0011 

0042 

4.04 

5000 

.0001 

— .0061 

.0195 

4.54 

.5000 

.0000 

— .001.1 

0041 

4.05 

5000 

.0001 

— .0059 

.0190 

4.55 

. 5000 

.0000 

— .0010 

. 0039 

4.06 

5000 

. 0001 

— .0058 

.0185 

4.56 

. 5000 

.0000 

— .0010 

.0038 

4.07 

5000 

.0001 

— .0056 

.0180 

4.57 

. 5000 

. 0000 

— .0010 

.0037 

4 08 

5000 

. 000.1 

— .0054 

.0175 

4.58 

. 5000 

.0000 

— .0009 

. 0035 

4.09 

5000 

.0001 

—.0052 

.0170 

4.59 

. 5000 

.0000 

— .0009 

. 0034 

4.10 

5000 

. 0001 

— .0051 

.0165 

4.60 

.5000 

.0000 

—.0009 

. 0033 

4.11 

5000 

.0001 

— .0049 

.0160 

4.61 

. 5000 

. 0000 

— .0008 

. 0032 

4 12 

5000 

.0001 

— .0047 

.0156 

4.62 

.5000 

. 0000 

— .0008 

.0031 

4 13 

5000 

.0001 

—.0046 

.0151 

4.63 

.5000 

.0000 

— .0008 

. 0030 

4.14 

5000 

.0001 

— .0044 

.0147 

4.64 

.5000 

.0000 

— .0007 

. 0028 

4.15 

5000 

.0001 

— .0043 

.0143 

4.65 

. 5000 

.0000 

—.0007 

.0027 

4.16 

5000 

.0001 

— .0042 

.0138 

4.66 

. 5000 

.0000 

— .0007 

.0026 

4.17 

5000 

.0001 

— .0040 

.0134 

4.67 

. 5000 

.0000 

— .0006 

. 0026 

4 18 

5000 

. 000.1 

— .0039 

.0130 

4.68 

.5000 

.0000 

— .0006 

. 0025 

4.19 

rsooo 

.0001 

— .0038 

.0127 

4.69 

. 5000 

.0000 

—.0006 

. 0024 

4.20 

, 5000 

.0001 

l —.0036 

.0123 

4.70 

. 5000 

.0000 

—.0006 

. 0023 

4.21 

. 5000 

.0001 1 

1 —.0035 

.0119 

4.71 

. 5000 

.0000 

— .0006 

. 0022 

4.22 

. 5000 

.000.1 

—.0034 

.0116 

4.72 

. 5000 

. 0000 

— .0005 

.0021 

4.23 

. 5000 

.0001 

— .0033 

.0112 

4.73 

. 5000 

. 0000 

— .0005 

. 0020 

4.24 

. 5000 

.0001 

— .0032 

.0109 

4.74 

.5000 

. 0000 

— .0005 

. 0020 

4.25 

.5000 

.0001 

— .0031 

.0105 

4.75 

.5000 

.0000 

— .0005 

.001.9 

4.26 

. 5000 

.0001 

— .0030 

.0102 

4.76 

. 5000 

. 0000 

— .0005 

.0018 

4.27 

. 5000 

.0000. 

— .0029 

. 0099 

4.77 

.5000 

.0000 

— .0004 

.0018 

4 28 

. 5000 

.0000 

— .0028 

. 0096 

4.78 

. 5000 

. 0000 

— .0004 

.0017 

4.29 

. 5000 

.0000 

— .0027 

.0093 

4.79 

. 5000 

.0000 

— .0004 

.0016 

4.30 

. 5000 

. 0000 

— .0026 

.0090 

4.80 

. 5000 

.0000 

— .0004 

.0016 

4.31 

.5000 

.0000 

—.0025 

. 0087 

4.81 

. 5000 

. 0000 

— .0004 

.0015 

4.32 

. 5000 

. 0000 

— .0024 

. 0085 

4.82 

.5000 

. 0000 

— .0004 

.0015 

4.33 

. 5000 

.0000 

— .0023 

.0082 

4.83 

.5000 

.0000 

—.0003 

.0014 

4.34 

.5000 

.0000 

— .0022 

.0079 

4.84 

. 5000 

.0000 

— .0003 

.0013 

4.35 

. 5000 

. 0000 

— .0022 

.0077 

4.85 

. 5000 

. 0000 

—.0003 

.0013 

4.36 

.5000 

.0000 

— .0021 

. 0074 

4.86 

. 5000 

.0000 

— .0003 

.0012 

4.37 

. 5000 

. 0000 

— .0020 

. 0072 

4.87 

.5000 

.0000 

—. 0003 

.0012 

4.38 

. 5000 

.0000 

— .0019 

.0070 

4.88 

. 5000 

.0000 

— .0003 

.0012 

4.39 

. 5000 

.0000 

— .00.19 

.0067 

4.89 

. 5000 

.0000 

— .0003 

.0011 

4.40 

. 5000 

.0000 

— .0018 

. 0065 

4.90 

.5000 

.0000 

— .0003 

.0011 

4.41 

. 5000 

. 0000 

— .0017 

.0063 

4.91 

. 5000 

. 0000 

— .0002 

.0010 

4.42 

.5000 

.0000 

— .0017 

.0061 

4.92 

. 5000 

. 0000 

— .0002 

. 0010 

4.43 

.5000 

.0000 

— .0016 

.0059 

4.93 

.5000 

. 0000 

— .0002 

. 0009 

4.44 

. 5000 

.0000 

— .0016 

.0057 

4.94 

.5000 

.0000 

— .0002 

.0009 

4.45 

. 5000 

.0000 

— .0015 

. 0055 

4.95 

. 5000 

. 0000 

—.0002 

.0009 

4.46 

.5000 

.0000 

— .0014 

. 0053 

4.96 

. 5000 

.0000 

— .0002 

. 0008 

4.47 

, 5000 

I .0000 

— .0014 

.0052 

4.97 

. 5000 

.0000 

— .0002 

. 0008 

4.48 

. 5000 

. 0000 

— .0013 

.0050 

4.98 

. 5000 

. 0000 

— .0002 

.0008 

4.49 

. 5000 

.0000 

— .0013 

.0048 

4.99 

.5000 

. 0000 

— .0002 

.0007 

4.50 

.5000 

.0000 

— .0012 

.0047 






1 Reproduced b 

V permission from 

Mathematical Tables from Handbook of Chemistry and 

Physics compiled by Charles D. Hodgman, 7id 

a ed., 1941, pp. 

200-204. 



* If the 

ordinate shown in column 

(3) is designate 

d as yo 

CD- then w is 

defined 

as « (I) 

rmiltiplied by 

/3,c s*\ 

V d dV 

and (pi 

^ is defined a 

s *° (s) 

multiplied b 

y 





(’-$ 

- + 

' + dV 

\ 




By successive differentiation of yo 

0 it can 

be shown that y>3 is 

also the third deriva- 

tive of yo 

(T) and ** (l 

^ is the fourth derivative of 

"(S) 

(see Chap. VII, pp. 142-145). 
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SAMPLING STATISTICS AND APPLICATIONS 


Table VII.— Table of 

The column headings in this table are probabilities. The figures in the 
body of the table are values of t. The stub of the table shows n, the degrees 
of freedom. The probabilities refer to the sum of the two tails of the t 
distribution, i.e., for both + and — values of t. For example, for n = 10, 
probability .05 shows in the table t = 2.228, which means P(t ^ |2.228|) 
equals .05; P(t ^ +2.228) equals .025; and P(t A -2.228) equals .025. The 


last row of the t table, for n = oo, shows t values corresponding to the 
values obtained from the area table of the normal curve (Table VI). 


x 

6 


n 

P = .9 

.8 

.7 

.6 

.5 

.4 

.3 

.2 

.1 

.05 

.02 

.01 

1 

.158 

.325 

.510 

.727 

1.000 

1 376 

1.963 

3.078 

6.314 

12.706 

31.821 

63.657 

2 

.142 

.289 

.445 

.617 

.816 

1.061 

1.386 

1.886 

2 920 

4.303 

6.965 

9.925 

3 

.137 

.277 

.424 

.584 

.765 

.978 

1.250 

1.638 

2 353 

3.182 

4.541 

5 841 

4 

.134 

.271 

.414 

.569 

.741 

.941 

1.190 

1.533 

2.132 

2.776 

3.747 

4.604 

5 

.132 

.267 

.408 

.559 

.727 

.920 

1.156 

1.476 

2.015 

2.571 

3.365 

4.032 

6 

.131 

.265 

.404 

.553 

.718 

.906 

1.134 

1.440 

1.943 

2.447 

3.143 

3.707 

7 

.130 

.263 

.402 

.549 

.711 

.896 

1.119 

1.415 

1.895 

2.365 

2.998 

3.499 

8 

.130 

.262 

.399 

.546 

.706 

.889 

1.108 

1.397 

1 860 

2.306 

2.896 

3.355 

9 

.129 

.261 

.398 

.543 

.703 

.883 

1.100 

1.383 

1.833 

2.262 

2.821 

3.250 

1° 

.129 

.260 

.397 

.542 

.700 

.879 

1.093 

1.372 

1.812 

2.228 

2.764 

3.169 

11 

.129 

.260 

.396 

.540 

.697 

.876 

1.088 

1.363 

1.796 

2.201 

2.718 

3.106 

12 

.128 

.259 

.395 

.539 

.695 

.873 

1.083 

1.356 

1.782 

2.179 

2.681 

3.055 

13 

.128 

.259 

.394 

.538 

.694 

.870 

1.079 

1.350 

1.771 

2.160 

2.650 

3.012 

14 

.128 

.258 

.393 

.537 

.692 

.868 

1.076 

1.345 

1.761 

2.145 

2.624 

2.977 

15 

.128 

.258 

.393 

.536 

.691 

.866 

1.074 

1.341 

1.753 

2.131 

2.602 

2.947 

16 

.128 

.258 

.392 

.535 

.690 

.865 

1.071 

1.337 

1.746 

2.120 

2.583 

2.921 

17 

.128 

.257 

.392 

.534 

.689 

.863 

1 069 

1.333 

1.740 

2.110 

2.567 

2.898 

18 

.127 

.257 

.392 

.534 

.688 

.862 

1.067 

1.330 

1.734 

2.101 

2.552 

2.878 

19 

.127 

.257 

.391 

.533 

.688 

.861 

1.066 

1.328 

1.729 

2.093 

2.539 

2.861 

20 

.127 

.257 

.391 

.533 

.687 

.860 

1.064 

1.325 

1.725 

2.086 

2.528 

2.845 

21 

.127 

.257 

.391 

.532 

.686 

.859 

1.063 

1.323 

1.721 

2.080 

2.518 

2.831 

22 

.127 

.256 

.390 

.532 

.686 

.858 

1.061 

1.321 

1.717 

2 074 

2.508 

2.819 

23 

.127 

.256 

.390 

.532 

.685 

.858 

1 060 

1.319 

1.714 

2.069 

2.500 

2.807 

24 

.127 

.256 

.390 

.531 

.685 

.857 

1.059 

1.318 

1.711 

2.064 

2.492 

2.797 

25 

.127 

.256 

.390 

.531 

.684 

.856 

1.058 

1.316 

1.708 

2.060 

2.485 

2.787 

26 

.127 

.256 

.390 

.531 

.684 

.856 

1.058 

1.315 

1.706 

2.056 

2.479 

2.779 

27 

.127 

.256 

.389 

.531 

.684 

.855 

1.057 

1.314 

1.703 

2.052 

2.473 

2.771 

28 

.127 

.256 

.389 

.530 

.683 

.855 

1.056 

1.313 

1.701 

2.048 

2.467 

2.763 

29 

.127 

.256 

.389 

.530 

.683 

.854 

1.055 

1.311 

1.699 

2.045 

2.462 

2.756 

30 

.127 

.256 

.389 

.530 

.683 

.854 

1.055 : 

1.310 

1.697 

2.042 

2.457 

2.750 

00 

.12566 

.25335 

.38532 

.52440 

.67449 

.84162 : 

1.03643 : 

1.28155 

1.64485 

1.95996 

2.32634 

2.57582 


* Reprinted from Table IV of R. A. Fisher, Statistical Methods for Research Workers (Oliver & Boyd, Ltd., 
Edinburgh), by kind permission of the author and publishers. 
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Table VIII.—Table of x 2 * 

The column headings in this table are probabilities. The figures m the 
body of the table are values of x 2 . The stub of the table shows », the 
degrees of freedom. The probability of a value of x' 'equal to^or greater 
than the value specified is given. Thus for » - 10, r(x ^3.94°) is .9 , 
and accordingly by subtraction P(x 2 5 3.940) is 1.00 - .95 - .05. . 


P = .99 

.98 

.95 

.90 

.80 

.70 

.000157 

.000628 

.00393 

.0158 

.0642 

.148 

.0201 

.0404 

.103 

.211 

.446 

.713 

.115 

.185 

.352 

.584 

1.005 

1.424 

.297 

.429 

.711 

1.064 

1.649 

2.195 

.554 

.752 

1.145 

1.610 

2.343 

3.000 

.872 

1.134 

1.635 

2.204 

3.070 

3.828 

1.239 

1.564 

2.167 

2.833 

3.822 

4.671 

1.646 

2.032 

2.733 

3.490 

4.594 

5.527 

2.088 

2.532 

3.325 

4.168 

5.380 

6.393 

2.558 

3.059 

3.940 

4.865 

0.179 

7.267 

3.053 

3.609 

4.575 

5.578 

6.989 

8.148 

3.571 

4.178 

5.226 

6.304 

7.807 

9.034 

4.107 

4.765 

5.892 

7.042 

8.634 

9.926 

4.660 

5.368 

6.571 

7.790 

9.467 

10.821 

5.229 

5.985 

7.261 

8.547 

10.307 

11.721 


.148 .455 1.074 1.642 2.706 3.841 5.412 6.635 

.713 1.386 2.408 3.219 4.605 5.991 7.824 9.210 

1.424 2.366 3.665 4.642 6.251 7.815 9.837 11.341 

2.195 3.357 4.878 5.989 7.779 9.488 11.668 13.277 

3.000 4.351 6.064 7.289 9.236 11.070 13.388 15.086 

3.828 5.348 7.231 8.558 10.645 12.592 15.033 16.812 

4 R71 6.346 8.383 9.803 12.017 14.067 16.622 18.475 


11.340 14.011 15.812 18.549 21.026 24.054 26.217 
jl2.340 15.119 16.985 19.812 22.362 25.472 27.688 
13.339 16.222 18.151 21.064 23.685 26.873 29.141 


Ifi 5812 6.614 7.962 9.312 11.152 12.624 15.338 18.418 20.465 23.542 26.296 29.633 32.000 

17 6408 7.255 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.769 27.587 30.995 33.409 

18 7 015 7906 9.390 10.865 12.857 14.440 17.338 20.601 22.760 25.989 28.869 32.346 34.805 

7 6 3 8 567 10.117 11.651 13.716 15.352 18.338 21.689 23.900 27.204 30.144 33.687 .19 

20 8.260 9-237 10.851 12.443 14.578 16.266 19.337 22.775 25.038 28.412 31.410 3o.020 37.566 

21 8 897 9 915 1U91 13.240 15.445 17.182 20.337 23.858 26.171 29.615 32.671 36.343 38.932 

Q?42 10 600 12 338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 33.924 37.659 40.289 

23 10 196 11 293 13 091 14.848 17 . L87 19.021 22^37 26.018 28.429 32.007 35.172 38.008 41.638 

O ' 6 992 8 15.659 18.062 19.943 23.337 27.096 29.553 33.196 36.415 40.270 42.^0 

25 11.524 12 697 14.611 16.473 18.940 20.867 24.337 28.172 30.675 34.382 37.652 41.566 44.314 

2R 12 198 13 409 15 379 17.292 19.820 21.792 25.336 29.246 31.795 35.663 38.885 42.856 45.642 

7 2 79 4 ' f25 6 151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 40.113 44.140 46.963 

28 13 565 ll'847 16.928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 41.337 45.419 48.278 

09 14 256 15 574 17 708 19.768 22.475 24.577 28.336 32.461 35.139 39.087 42.557 46.693 49.588 

30 14.953 16-306 18.493 20.599 23.364 25.508 29.336 33.530 36.250 40.256 43.773 47.962 50.892 

’ ___,_ — L--J \ - 

For larger values of n, the expression V^t 2 ~ L ^ be used as a normal deviate with unit stan- 

^^tpriuted from Table III of R. A. Fisher, Statistical Methods for Research Workers (Oliver & Boyd, Ltd., 
Edinburgh), by kind permission of the author and publishers. 
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SAMPLING STATISTICS AND APPLICATIONS 


Table X.-5 per Cent and 10 Per Cent Points of the Sampling Dis- 
TRIBUTION OF V^i FOR SAMPLES OF VARIOUS SlZES 1 
(Approximate values) 


Size of 
sample 

Values of V/Si for P(-\/p 1 £) 

.05 

. 10 

25 

.711 

1.061 

30 

.661 

.982 

35 

.621 

.921 

40 

.587 

.869 

45 

.558 

.825 

50 

.533 

.787 

60 

.492 

.723 

70 

.459 

.673 

80 

.432 

.631 

90 

.409 

.596 

100 1 

.389 

.567 


A . vv imams, iNote on the Sampling Distri 
Population Is formal, Biometrika, Vol. 27 (1935), pp. 269-271. 


T ™ L A XI '~ L ° W fA E AND Ui,pER 1 Per Cent and 5 C^T Points of 
the Sampling Distribution of & for Samples of Various Sizes* 

_ (Approximate values) 


Values of £2 


Size of sample 


For P(/?2 ^ ) 


.01 


.05 


100 

125 

150 

175 

200 

250 

300 

400 

500 

800 


2.18 

2.24 

2.29 

2.33 

2.37 

2.42 

2.46 

2.52 

2.57 

2.65 


2.35 

2.40 

2.45 

2.48 

2.51 


2.55 

2.59 

2.64 

2.67 

2.74 


( For P(/S 2 ^ ) 

.05 

.01 

3.77 

4.39 

3.70 

4.24 

3.65 

4.14 

3.61 

4.05 

3.57 

3.98 

3.52 

' 3.87 

3.47 

3.79 

3.41 

3.67 

3.37 

3.60 

3.29 ; 

3.46 


1,000 2.68 

2,000 2.77 

5,000 2.85 


2.76 

2.83 

2.89 


I 


3.26 

3.18 

3.12 


3.41 

3.28 

3.17 


Abridged from E. S. Pearson, 


Biometric Vol. 22 (1930-lVaHfro 239249 V°f* lopment ° f ^ for Normality/ 
s pp> ^ 4y - Iabl e assumes population to be normal. 
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SAMPLING STATISTICS AND APPLICATIONS 


Table XIII.-— Sampling Distribution of the Range w = Xn ~ Xl 

----- (T 


Lower tail of distribution Upper tail of distribution 


• 001 .005 .010 .025 .050 .100 .100 .050 .025 .Oioj.005 .001 


• 00 .01 .02 .04 .09 
.06 .13 .19 .30 .43 
.20 .34 .43 .59 .76 
.37 .55 .66 .85 1.03 1 
•54 .75 .87 1.06 1.25 1 
.69 .92 1.05 1.25 1.44 1 
,.83 1.08 1.20 1.41 1.60 1 
.96 1.21 1.34 1.55 1.74 1 
• 08 1.33 1.47 1.67 1.86 2. 
-20 1.45 1.58 1.78 1.97 2. 
.30 1.55 1.68 1.88 2.07 2. 
■ 38 1.64 1.77 1.97 2.16 2. 
.47 1.72 1.86 2.06 2.24 2. 
55 1.80 1.93 2.14 2.32 2. 
63 1.88 2.01 2.21 2.39 2. 


.743 1.69 1. 
.738 1.75 2. 
.733 1.82 2. 
.729 1.88 2. 


.07 2.27 2.45 2. 
.14 2.34 2.51 2. 
.20 2.39 2.57 2. 
25 2.45 2.63 2. 


3 1 693 ‘591 (888 or h ’?oI 'S .18 2.33 2.77 3.17 3.64 3.97 4.65 

4 2 059 ssn ‘on o ' 30 ^ • 92 2 • 99 3.31 3 .68 4.12 4.42 5.06 

4 2.059 .486 .880 .20 .34 .43 .59 .76 .98 3.24 3.63 3.98 4.40 4.69 5.31 

6 Hit '395 ‘ 6 ? -85 1.03 1.26 3.48 3.86 4.20 4.60 4.89 5.48 

7 0 704 I 70 It? ‘ft ‘ 75 * 1 • °0 1.25 1.49 3.66 4.03 4.36 4.76 5.03 5.62 

2.704 .370 .833 .69 .92 1.05 1.25 1.44 1.68 3.81 4.17 4.49 4.88 5.15 5 73 

9 2*970 337 IS 1 ' of 1 ' o° 1-41 1,99 1 • 83 3 .93 4.29 4.61. 4.99 5.26 5.82 

10 3 07g f 2 l ‘797/nt It ^ 1-55 1.74 1.97 4.04 4.39 4.70 5.08,5.34 5.90 

3.078 .325 .797 1.08 1.33 1.47 1.67 1.86 2.09 4.13 4.47 4.79 5.16 5.42 5 97 

12 3*258 "307 rStlntltJ-f 1 - 78 1 - 97 ^2 ' 20 4 ‘ 21 4 ‘ 55 4.86 5.23 5.49 6.04 

13 3*3?fi ‘ton ‘ItnL to 1,68 U 88 2 '° 7 2 ‘ 30 4 ‘ 29 4 ‘ 62 4.92 5.29 5.54 6.09 

13 3.336 .300 .770 1.38 1.64 1.77 1.97 2.16 2.39 4.35 4.69 4.99 5.35 5.60 6 15 

is t'trl 'lit •^ll ^ 1 *! 2 1 - 86 2 * 06 2 - 24 2 - 47 4 - 41 4 - 74 5 - 04 5 ' 40 5 - 65 6 - 29 

16 IZl *28? It ‘f 9 1 - 93 2 -l 4 2 -32 2.54 4.47 4.80 5.09 5.45 5.70 6.24 

6 3.532 .283 .749 1.63 1.88 2.01 2.21 2.39 2.61 4.52 4.85 5.14 5.49 5.74 6.28 

18 ?*eS ‘tit 'lill't 9 ^H 2 - 072 - 27 2,45 2 - 67 4 - 574 - 89 5 - 18 ^ 54 5 - 79 ^. 3 ! 

19 oil r 14 2,34 2<5 1 2 - 73 4.61 4.93 5.22 5.58 5.82 6 .35 

‘ 33 1.8 2 2.07 2.20 2 .39 2.57 2.79 4.65 4.97 5.26 5.61 5.86 6.38 

_ ' 735 ‘ 268 • 729 |l 88 2.13|2.25 2.45 2.63 2.844.69 5.0lj5.30|5.65|5.89|6.41 

in S a m e ple°s d of C ^ “ The Probabilit ^ Integral of the Range 

PP 3^8^H p 0118 I 0 ” 1 r N ° rm i P ° pulation ’” Biometrika, Vol. 32 (1941-1942), 

(1941—1942'! r, i,. u V , f l’ PP ' 404_417 - Ta De 2 m Btometnha, Vol. 32, 
uy-ii p. 308, has been extended from N = 12 tn AT = on w -1 a ’ 

r 302 " 307> of the same artici ° : 

p 416 it 64 ,r ° m * *“ ble “ Vol. 24 (1932), 

sample. b ® n ° ted that refers to the largest and X, to the smallest X in the 

in random S a“pTeaTom t LrmaT“pluoTwIr^Talclm by^Hjpett °D "T 

alone iSfnot nJo^deT,t ‘‘ WaS 6vident that its mean “ d standard deviation 

aione would not provide all the information generally needed in practice. Tippett included 

zzw rr a E nd s r the rr r an/r ^ ™ 

k “Sexsts “f “ h0 p “-“ : f f 

risk that the hhlo u • Ai “leases beyond this vaiue, there is an increasing 

Nonii^Population 1 , ’ “ 0b “™*“ * 

t On is the reciprocal of the mean w. 
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Table XIV.—Hyperbolic Tangents 1 


z 

r = tanh z 

* 

r = tanh z 

z 

r => tanh z 

0.00 

0.00000 

0.55 

0.50052 

1.10 

0.80050 

0.10 

.01000 

0.56 

.50798 

1.11 

.80406 

0.02 

.02000 

0.57 

.51536 

1.12 

.80757 

0.03 

.02999 

0.58 

.52267 

1,13 

.81102 

0.04 

.03998 

0.59 

.52990 

1.14 

.81441 

0.05 

0.04996 

0.60 

0.53705 

1.15 

0.81775 

0.06 

.05993 

0.61 

.54413 

1.16 

.82104 

0.07 

.06989 

0.62 

.55113 

1.17 

*82427 

0.08 

.07983 

0.63 

.55805 

1.18 

.82745 

0.09 

.08976 

0.64 

.56490 

1.19 

.83058 

0.10 

0.09967 

0.65 

0.57167 

1.20 

0.83365 

0,11 

.10956 

0.66 

.57836 

1.21 

.83668 

0.12 

.11943 

0.67 

.58498 

1.22 

.83965 

0 13 

.12927 

0.68 

.59152 

1.23 

.84258 

0.14 

.13909 

0.69 

.59798 

1.24 

.84546 

0.15 

0.14889 

0.70 

0.60437 

1.25 

0.84828 

0 16 


0.71 

.61068 

1.26 

.85106 

0.17 

.16838 

0.72 

.61691 

1.27 

.85380 

0 18 

.17808 

0.73 

.62307 

1.28 

.85648 

0.19 

.18775 

0.74 

.62915 

1,29 

.85913 

0 20 

0.19738 

0.75 

0.63515 

1.30 

0.86172 

0.21 

.20697 

0.76 

.64108 

1.31 

.86428 

0.22 

.21652 

0.77 

.64693 

1.32 

.86678 

0.23 

.22603 

0.78 

.65271 

1.33 

.86925 

0.24 

.23550 

0.79 

.65841 

1.34 

.87167 

0.25 

0.24492 

0.80 

0.66404 

1.35 

0.87405 

0.26 

.25430 

0.81 

.66959 

1.36 

.87639 

0.27 

.26362 1 

0.82 

.67507 

1.37 

.87869 

0.28 

.27291 

0.83 

.68048 

1.38 

.88095 

0.29 

.28213 

0.84 

.68581 

1.39 

.88317 

0.30 

0.29131 

0.85 

0.69107 

1.40 

0.88535 

0.31 

.30044 

0.86 

.69626 

1.41 

.88749 

0.32 

.30951 

0.87 

.70137 

1.42 

.88960 

0.33 

.31852 

0.88 

.70642 

1.43 

.89167 

0.34 

.32748 

0.89 

.71139 

1.44 

.89370 

0.35 

0.33638 

0.90 

0.71630 

1.45 

0.89569 

0.36 

.34521 

0.91 

.72113 

1.46 

.89765 

0.37 

.35399 

0.92 

.72590 

1.47 

.89958 

0.38 

.36271 

0.93 

.73059 

1.48 

.90147 

0.39 

.37136 

0.94 

.73522 

1.49 

.90332 

0.40 

0.37995 

0.95 

0.73978 

1.50 

0.90515 

0.41 

.38847 

0.96 

.74428 

1.51 

.90694 

(L42 

.39693 

0.97 

.74870 „ 

, 1.52 

.90870 

0 43 

.40532 

0.98 

.75307 

1.53 

.91042 

0^44 

.41364 

0.99 

.75736 

1.54 

.91212 

0.45 

0.42190 

1.00 

0.76159 

1.55 

0.91379 

0.46 

.43008 

1.01 

.76576 

1.56 

.91542 

0.47 

.43820 

1.02 

.76987 

1.57 

.91703 

0.48 

.44624 

1.03 

.77391 

1.58 

.91860 

0.49 

.45422 

1.04 

.77789 

1.59 

.92015 

0.50 

0.46212 

1.05 

0.78181 

1.60 

0.92167 

0.51 

.46995 

1.06 

.78566 

1.61 

.92316 

0.52 

.47770 

1.07 

.78946 

1.62 

.92462 

0.53 

.48538 

1.08 

.79320 

1.63 

.92606 

0.54 

.49299 

1.09 

.79688 

! 1.64 

.92747 


1 Source: Hodoman, Charles C., Mathematical Tables from Handbook of Chemistry and 
Physics (1941). 
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Table XIV.- 

— Hyperbolic 

Tangents. 

—( Continued ) 


z 

r = tanh z 

Z 

r = tanh z 

z 

r = tanh z 

1.65 

1.66 

1.67 

1.68 
1.69 

0.92886 
.93022 
.93155 
.93286 
.93415 

2.20 

2.21 

2.22 

2.23 

2.24 

6.97574 

.97622 

.97668 

.97714 

.97759 

2.75 

2.76 

2.77 

2.78 

2.79 

0.99186 
.99202 
.99218 
.99233 
.99248 

1.70 

1.71 

1.72 

1.73 

1.74 

0.93541 

.93665 

.93786 

.93906 

.94023 

2.25 

2.26 

2.27 

2.28 

2.29 

0.97803 
.97846 
.97888 
.97929 
.97970 

2.80 

2.81 

2.82 

2.83 

2.84 

0.99263 
* . 99278 

.99292 
.99306 
.99320 

1.75 

1.76 

1.77 

1.78 

1.79 

0.94138 
.94250 
.94361 
.94470 
.94576 

2.30 

2.31 

2.32 

2.33 

2.34 

0.98010 

.98049 

.98087 

.98124 

.98161 

2.85 

2.86 

2.87 

2.88 

2.89 

0.99333 
.99346 
.99359 
.99372 
.99384 

1.80 

1.81 

1.82 

1.83 

1.84 

0.94681 
.94783 
.94884 
.94983 
.95080 

2.35 

2.36 

2.37 

2.38 

2.39 

0.98197 

.98233 

.98267 

.98301 

.98335 

2.90 

2.91 

2.92 

2.93 

2.94 

0.99396 

.99408 

.99420 

.99431 

.99443 

1.85 

1.86 

1.87 

1.88 
1.89 

0.95175 

.95268 

.95359 

.95449 

.95537 

2.40 

2.41 

2.42 

2.43 

2.44 

0.98367 
.98400 
.99431 
.98462 
.98492 

2.95 

2.96 

2.97 

2.98 

2.99 

0.99454 
.99464 
.99475 
.99485 
.99496 

1.90 

1.91 

1.92 

1.93 ' 

1.94 

0.95624 
.95709 
.95792 
.95873 
.95953 

2.45 

2.46 

2.47 

2.48 

2.49 

0.98522 
.98551 
.98579 
.98607 
.98635 

3.0 

3.1 

3.2 

3.3 

3.4 

0.99505 
.99595 
. 99668 
.99728 
.99777 

1.95 

1.96 

1.97 

1.98 

1.99 

0.96032 

.96109 

.96185 

.96259 

.96331 

2.50 

2.51 

2.52 

2.53 

2.54 

0.98661 

.98688 

.98714 

.98739 

.98764 

3.5 

3.6 

3.7 

3.8 

3.9 

0.99818 

.99851 

.99878 

.99900 

.99918 

2.00 

2.01 

2.02 

2.03 

2.04 

0.96403 

.96473 

.96541 

.96609 

.96675 

2.55 

2.56 

2.57 

2.58 

2.59 

0.98788 
.98812 
.98835 
.98858 
.98881 

4.0 

4.1 

4.2 

4.3 

4.4 

0.99933 
.99945 
.99955 
.99963 
.99970 

2.05 

2.06 

2.07 

2.08 

2.09 

0.96740 

.96803 

.96865 

.96926 

.96986 

2.60 

2.61 

2.62 
# 2.63 

2.64 

0.98903 

.98924 

.98946 

.98966 

.98987 

4.5 

4.6 

4.7 

4.8 

4.9 

0.99975 
.99980 
.99983 
.99986 
.99989 

2.10 

2.11 

2.12 

2.13 

2.14 

0.97045 

.97103 

.97159 

.97215 

.97269 

2.65 

2.66 

2.67 

2.68 

2.69 

0.99007 

.99026 

.99045 

.99064 

.99083 

5.0 

0.99991 

2.15 

2.16 

2.17 

2.18 
2.19 

0.97323 

.97375 

.97426 

.97477 

,97526 

2.70 

2.71 

2.72 

2.73 

2.74 

0.99101 

.99118 

.99136 

.99153 

.99170 
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SUBJECT INDEX 


A 

Analysis of variance, chance varia¬ 
tion, measure of, more 
than one basis of classifica¬ 
tion, more than one case 
in each class, 435—436 
single case in each class, 434 
one basis of classification, 434 
F distribution, use of, more than 
one basis of classification, 
more than one case in each 
class, 437-438, 441-442 
one case in each class, 431, 
433 

selection of region of rejection, 
428 

single basis of classification, 425 
testing correlation coefficients, 

etc., 443 

testing linearity, 444 
“interaction,” 435-436 
more than one basis of classifica¬ 
tion, more than one case 
in each class, analysis of 
variance table, 441 
numerical analysis, 438-442 
study of results, 441-442 
theoretical basis, 435-438 
single case in each class, analy¬ 
sis of variance table, 433 
comparison of results with 
those of single basis, 434 
numerical analysis, 431-433 
theoretical basis, 429-431 
nonnormal propulation, 449-450 
remainder variance, more than 
one case in each class, 435- 
436 

one case in each class, 429-433 


Analysis of variance, single basis 
of classification, numerical 
analysis, 425-428 
theoretical basis, 423-425 
worksheet for calculating sums 
of squares, 427 

tests of correlation coefficients 
as, 442-444 
tests of linearity, 444 
Asymmetrical binomial distribution, 
betas, formulas for, 45 
derivation, 67 
characteristics, 44-46 
derivation, 40-44 
as distribution of sample per¬ 
centages, 190 
effect of changing N, 49 
formula, 43 
graphs, 44, 45 
mean, formula for, 45 
derivation, 65-66 
mode, formula for, 45 
derivation, 68 

and the normal curve, 46-47, 
68-74 

numerical examples, 42, 43 
and Pearson’s type III curve, 
47-50 

relative slope, 76—77 
second approximation, 72-73 
standard deviation, formula for, 
45 

derivation, 66-67 
Average deviation, 12—13 
Averages (see individual titles such 
as: Mean, Median, and Mode) 

B 

Beta coefficients, /3i and /3 2 , 10-11__ 
sampling distribution of 
242-244 
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Beta coefficients, sampling distri¬ 
bution of VFi, table of .05 and 
.10 points, 480 

sampling distribution of (3 2 , 242- 
245 

table of .01 and ,05 points, 480 
standard error of VjSi, 242, 244 
standard error of /3 2 , 242, 244 
Binomial distribution ( see Sym¬ 
metrical binomial distribution; 
Asym metrical binomial dis¬ 
tribution) 

Binomial expansion, 25-26 
Bivariate frequency distribution, 
13-22 

Boldfaced type, 10 
Breve, 10 

C 

Calculus of Observations, The, 93 
Camp, B. H., 454 

Chi square (x 2 ) distribution (see 
Frequency curves, chi square 
(x 2 ) distribution) 

Chi square test ( see Frequency 
curves, testing goodness of fit, 
by x 2 test) 

Coefficient, of correlation (see Cor¬ 
relation, coefficient of) 
of multiple correlation ( see Mul¬ 
tiple correlation, coefficient 
of) 

of risk (see Statistical inference, 
testing hypotheses, coefficient 
of risk) 

Combinations, 24-25 
Confidence coefficient (see Statis¬ 
tical inference, confidence in¬ 
tervals, confidence coefficient) 
Confidence intervals (see Statistical 
inference, confidence intervals) 
Contingency tables, 337, 422n. 

(See also Independence, test of) 
two-fold classification, 395-397 
Correlation, coefficient of, Peffi*- 
sonian, 19 

and the breakup of variance, 20-21 
confidence limits for, 301-302 


Correlation, coefficient of, Pear- 
sonian, maximum likelihood 
estimate of, 302-303 
as measure of goodness of fit, 20 
as measure of proportion of total 
variance, 21 

relationship to first-order variance, 
19-20 

sampling distribution, 298-300 
significance, 21 

testing hypotheses about, 300-301 
Correlation index, testing signifi¬ 
cance of, 306 
Correlation ratio, 21-22 
testing significance of, 306 
Cumulants, of contributory causes, 
relationship to cumulants of 
variable, 83, 94-96 
definition of, 83 
as semi-invariants, 83 

D 

Danish census (1923), 184 
Deciles, 8 

Degrees of freedom, in analysis of 
variance, 424, 434, 441 
in a contingency table, 337 
explained, 314, 318-319 
fitting a frequency curve, 332-333 
identified with n, 327-328 
Density of samples, in distribution 
of all possible samples, 256-258 
Dependent variable, estimation of, 
from sample line or plane 
of regression, 387-388 
with allowance for sampling 
errors, 388-389 
Design of experiments, 162 
Dispersion, subnormal and super¬ 
normal, 422n. 

E 

Elderton, W. P., 128, 134, 148 
Ezekiel, M., 305 

F 

F distribution (see Frequency curves, 

F distribution) 
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First-order standard deviation, 18 
19 (See also Higher-order vari¬ 
ances) 

definition, 18 
relationship to r, 19-20 
Fisher, Arne, 92 

Fisher, R. A., 11, 114ft., 242, 299 
Flat space, 258' 

Frequency curves, chi square (x 2 ) 
distribution, description, 111- 
112 

formula, 111 
graphs, 112 
mean, 112 
mode, 112 

probabilities, table of, 475 
standard deviation, 112 
explanation of, 3-5 
F distribution, description, 112— 
114 

formula, 113 
graph, 113 
mean, 113 
mode, 113 

probabilities, table of, 476-479 
standard deviation, 113 
fitting of, Gram-Charlier curves, 
133-134 

nonnormal curves, 132-137 
normal curve, 131-132 
Pearsonian curves, 134-137 
sampling curves, 137 
Sheppard’s corrections, JO, 12, 
131-132, 134 

Gram-Charlier (see Gram-Charlier 
1 frequency curves) 

graph, 4 

graphing of, labeling of vertical 
scale, 110 ft. 

normal curve, 115-116 
other curves, 118—119 
t, x 2 and F curves, 116-117 
nonnormal, examples from every¬ 
day life, 105-106 
examples from sampling analy¬ 
sis, 107-114 

normal (see Normal frequency 
curves) - • 


Frequency curves, Pearsonian (see 
Pearsonian frequency curves) 
probabilities, computation of, 119- 
120 

for x 2 curve, 125-126 
for F curve, 127 
for normal curve, 120-123 
for other curves, 127-131 
for t curve, 123-124 
by quadrature formulas, 128 
as probability distributions, 28 
sampling distributions (see Sam¬ 
pling distribution) 
t distribution, approximation by 
normal curve for ft between 
30 and 100, 401ft. 
description, 109-110 
formula, 111 
graph, 110 
lcurtosis. 111 

mean, 111 ! 

probabilities, table of, 474 
variance, 111 

testing goodness of fit, 331-333 
by x 2 test, Gram-Charlier 
curves, 144-145 
normal curve, 139-142 
Pearsonian curves, 151, 152 
by graphic comparison, Gram- 
Charlier curves, 142-144 
normal curve, 137-139 
Pearsonian curves, 146-151 
theory of, conditions leading to 
nonnormality, 100-105 
conditions leading to normality, 
37-39, 100 

contributory causes, Gram- 
Charlier curves, 83-84, 90- 
92, 93-98 

nonnormal curves in general, 

100-103 

normal curve, 35-39, 100 
Pearsonian curves, 60-65 
sampling distribution of the 
mean, 107-108 
summary, 100-105 
distribution, 114n- 
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Frequency Curves and Correlation 

134, 148 

Frequency distributions, 1-5 
bivariate, 13-14 
illustration of, 13 
Frequency series, continuous, 1-2 
discrete, 1 

Fourier integral theorem, 93, 97 
G 

Galton, Francis, 18 
Gamma function, 78-79n., 254-255 
Goodness of fit, of frequency curves 
(see Frequency curves, testing 
goodness of fit) 

Gram-Charlier frequency curves, 
assumptions, 82-83 
comparison with Pearsonian 
curves, 90-92 

comparison with Pearsonian type 
III curve, 82 
components of, 85-90 
combinations of, 86, 87, 88, 89 
graphs, 85, 87 
derivation, 82-85, 92-99 
formula for type A, 84 
general significance, 90-92 
probabilities, computation of, 145- 
146 

relationship to asymmetrical bi¬ 
nomial distribution, 91-92 
type B , 90 

g statistics, 11-12, 243 
sampling distributions, 242-243 
standard errors, 243 

Guldberg, M. A., 454 

H 

Higher-order variances, confidence 
limits, 386-387 
definition, 18 

maximum-likelihood estimates, 
387 

sampling distribution, 385-386 
testing hypotheses about, 386 
use in estimating dependent vari¬ 
ables, 387-390 


Histogram, 2-3 
graph, 2 

of relative frequencies, graph, 4 
Homogeneity, meaning, 101-103 
test of, 338-339 
Hotelling, H., 452 

Hyperbolic tangents (table), 483- 
484 

Hypergeometrical distribution, char¬ 
acteristics, 55 
criterion for, 81 
derivation, 51-55 
as distribution of sample per¬ 
centages, 210-211 
formula, 54 
graph, 55 
mean, 55 
moments, 56 
numerical example, 53 
and the Pearsonian curves, 57-58 
relative slope, 79-81 
Hypergeometrical series, 54n. 
Hyperplane, 253 
Hypersphere, 254 

Hypotheses, testing of (see Statisti¬ 
cal inference, testing hypotheses) 

I 

Incomplete gamma function, 255 
Independence, meaning of, 334-335 
test of, distribution of 
V (Ni — Npi ) 2 

& Apt- 

use of, 337-338 

estimating population percent¬ 
ages, 335-337 
the null hypothesis, 334 
the problem, 333-334 

J 

Journal of the Royal Statistical 
Society, 159 

K 

Kendall, M. G., 159, 160 
k statistics, 11-12, 242-243 
Kurtosis, 11 
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L 

Law ? of large numbers, 27-28 
of small chances, 212 
of small numbers, 212 
Least squares, method of, 15, 374 
minimizing horizontal deviations, 
illustrated, 16 

minimizing vertical deviations, 
illustrated, 15 

Leptokurtic distributions, 11 
Lexis’ analysis, 422n. 

Likelihood ratio, 345 
Linear function, sampling distri¬ 
bution, 108-109 
Lines of regression, 16-18 
Literary Digest poll, 157 
Logarithms, of factorials, 117 
or numbers, four-place common 
(table), 457-460 

representation by an infinite series 

70-71». 

M 

Mathematical Statistics, 92 
Mathematical Theory of Probabilities, 

92 

Maximum-likelihood estimates (see 
Statistical inference, maximum 
likelihood estimates of popula¬ 
tion parameters) 

Mean, arithmetic, calculation, 6—7 
definition, 5-6 

joint sampling distribution of 
mean and standard deviation 
340-344 

sampling distribution, any pop¬ 
ulation, 107-108 
normal population, 231, 262- 
263 

special nonnormal popula¬ 
tions, 445-446 
standard error, 231, 268 
geometric, 9 
harmonic, 9 

Mean value, of a sum or difference, 
formula, 392 

proof of formula, 419-420 


Means, progressions of, 16 
charts, 14, 15 
Median, definition, 7 
sampling distribution, 241-242 
standard error, 242 
Meidel, M. B., 454 

Methods of Correlation Analysis:Zb§ 
Mode, 8 

Moment coefficients, 9 

Moment generating function, 93-94 

Moments, 9 

Most powerful test, 198n. 
Multinomial distribution, 309-323 
(See also Random sampling, 
from a discrete manifold 
population, distribution of 
sample percentages, and test- 
ing hypotheses about p*’s) 
Multinomial expansion, 26 
Multiple correlation, coefficient of, 
confidence limits for, 305 
maximum-likelihood estimate 
305-306 

testing significance of, 305 
Multivariate frequency distribution 
13-14 

N. 

Narumi, S., 454 

M-dimensional geometry, 253-254 
Neyman, J., 353, 362 
Nonnormality, problem of, 445-455 
indirect attacks on, 451-455 
qualitative or semiqualitative 
methods, 451-453 
Tch6by chef’s and other in¬ 
equalities, 453-455 
transformation 6f the data, 451 
sampling distributions of various 
statistics for specific non¬ 
normal populations, 445-448 
use of normal theory for non¬ 
normal populations, in case 
of analysis of variance, 449- 
450 

in case of correlation coeffi¬ 
cients, 451 
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Nonnormalityj use of normal theory 
for nonnormal populations, 
in case of means, 448-449 
in case of \/N (X-%) /or, 449 
in case of variances, 450 
Normal frequency curve, areas under 
(table), 469-473 
characteristics, 10, 12 
conditions leading to, 37-39 
derivation, 34-35 
derivatives of (table of and 
<p 4 ), 469-473 

first approximation to binomial 
distribution, 73 
fitting of, 131-132 
formula, 35 
graph, 121 

graphing of, 115-116 
ordinates of (table), 469—473 
probabilities, computation of, 123- 


Pearson, E. S., 353, 362 

Pearson, Karl, 57, 58, 64, 134, 148, 
158, 255 

Pearsonian coefficient of correlation 
(see Correlation, coefficient of, 
Pearsonian) 

Pearsonian frequency curves, equa-' 
tions for, determination of, 
146-151 

as an explanation of nonnormal 
curves, 65 

general formula, 134 
and* the hypergeometrical dis¬ 
tribution, 57-58 

computation of probabilities, 151, 
152 

relative slope, formula for, 57 
types, chart for distinguishing, 136 
criterion for distinguishing, 58, 
135 


124 

significance, 35-37 
standard form, 137-139 
testing goodness of fit, by x 2 test, 
139-142 

by graphic comparison, 137-139 
truncated, 104-105 
Normality, determining departure 
from, by fitting a normal 
curve, 131—132, 137—142 
by special statistics, 242—245, 
296-297 

Null hypothesis, in analysis of 
variance, 423—424, 430—431, 436 
in comparing two samples, 391- 
392 

in testing difference between two 
sample means, 398 
percentages, 393 
variances, 407—408 
in testing independence, 334 

P 

Pabst, Margaret R., 452 
Parameters, 5, 10 

Partial correlation, coefficient of, 
inferences about, 303—304 


main, 58-60 
table of, 135 
transitional, 60 
type I, graph, 59 
type III, derivation, 48-50, 
76-79 

formula, 49 
graph, 49 

type IV, graph, 59 
Percentage, sampling distribution, 
190-194, 210, 211-212 
standard error, 190, 195, 210-211, 
215, 315, 319 

statistical inferences about (see 
Random sampling, from a 
discrete twofold population; 
Random sampling, from a 
discrete manifold population) 
Percentiles, 8 
Permutations, 23 
Platykurtic distributions, 11 
Poisson distribution, betas, 215, 217 
derivation, 215-216 
formula, 212 
. mean, 215 

derivation, 217 
moments, 217-218 
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Poisson distribution, and the normal 
distribution, 215, 217 
as sampling distribution of per¬ 
centages, 211-213 
testing hypotheses with, 213-215 
variance, 215 

derivation, 217-218 
Polls, public opinion (see Random 
sampling, of public opinion) 
Polygons of regression, 16 
Population, bivariate, samples from. 
298 

meaning of, 5 
types, 154-155 
Power of a test, 198n. 

Probability, definition of mathe¬ 
matical probability, 26-27 
dependent probabilities, 30-31 
empirical approximated, 27 
equations involving, 28-29 
independent probabilities, 30 
and law of large numbers, 27-28 
probability set, 26-27, 30 
Probability calculus, addition the¬ 
orem, 29 

meaning of mutually exclusive in, 
29 

multiplication theorem in, 30 
dependent probabilities, 30-31 
independent probabilities, 30 
Probability curves (see Frequency 
curves) 

Probability distributions, definition, 

, 28 

Probability set, 26, 30 
derived or second order, 30 
Progressions of the means (see 
Means, progressions of) 

Public opinion polls (see Random 
sampling, of public opinion) 

Q 

Quadrature formulas, 128 
Quality control, use of range in 
295-296 
Quartiles, 7-8 


R 

Random sampling, from a discrete 
manifold population, assump¬ 
tions, 308—309 

confidence zone for p/s, 328-329 
distribution of sample 
V (Ni — Npi ) 2 

4 Np'i ’ 
explanation, 323-326 
distribution of sample per¬ 
centages, derivation, 309- 
312 

formula, 311 

illustration of a skewed dis¬ 
tribution, 317-319 
illustration of symmetrical 
distribution, 312-314 
mean of, 314-317 
standard deviation of, 314- 
317 

maximum-likelihood estimates 
of p/s, 329-330 

testing hypotheses about p/s, 
using distribution of 
V (Ni— Npi ) 2 

h - Npf—’ 326-328, 330- 

331 

using multinomial distribu¬ 
tion, 319-323 

from a discrete twofold popula¬ 
tion, assumptions, 186-187 
confidence coefficient and the 
confidence interval, 207-208 
distribution of difference be¬ 
tween two sample percent¬ 
ages, 393-394 

distribution of sample per¬ 
centages, derivation, 188- 
190 

formula, 190 

and the population percent¬ 
age, 191-194 

and the size of the sample, 
190-191 

estimation of population per¬ 
centage, confidence inter¬ 
vals, 202-207 
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Random sampling, from a discrete 
twofold population, esti¬ 
mation of population per¬ 
centage. maximum likeli¬ 
hood estimate, 208-209 
size of sample and the con¬ 
fidence interval, 207 
small population percentage, 
211-215 

small populations, 209-211 
testing the difference between 
two sample percentages, 
392-395 

alternative method, 395-397 
testing hypotheses, coefficient 
of risk, 195 

regions of rejection, 195—201 
the test, 202 

from a normal population, con¬ 
fidence limits for d 2 , 287-290 
confidence limits for X, rela¬ 
tionship between interval, 
coefficient, and N, 272-273 
d known, 269-272 
d unknown, 280—283, 284 
distribution of all possible sam¬ 
ples, derivation, 255-256 
geometrical representation, 
256-258 

properties, 257-259 
in terms of their means and 
variances, 259—262 
distribution of samples of N = 2, 
derivation, 221-224, 246- 
247 

geometrical measurement of 
■s/N (X - X)/*, 228-229 
geometrical measurement of 
<r, 226—228 

geometrical measurement of 
cr 2 , 226-228, 248 
geometrical measurement of 
X, 225-226, 248 
numerical illustration, facing 
224 

properties, 224—229, 247—248 
in terms of their means and 
variances, 248-250 


Random sampling, from a normal 
population, distribution of 
sample a ’s (=A.D./<r), 

244-245 

use in testing departure from 
normality, 296-297 
distribution .of sample means, 
derivation, 262-263 
formula, 231 

(iV = 2), derivation, 229-231, 
250-251 

use in making inferences 
about population mean, 
267-273 

distribution of sample medians, 
241-242 

distribution of sample 

Vn (x - x)/*,. 

derivation, 264-266 
formula, 241 

(jY = 2), derivation, 236- 
241, 252-253 

use in making inferences 
about population mean, 
273-284 

distribution of sample ranges, 
242 

use in analysis of variance, 

295 

use in making inferences 
about population standard 
deviation, 294-295 
use in quality control, 295- 

296 

distribution of sample standard 
deviations, derivation, 264 
formula, 236 
(N = 2), derivation, 236 
distribution of sample variances, 
derivation, 263—264 
formula, 234 

(N = 2), derivation, 232- 
233, 251-252 

relation to x 2 distribution, 
234-236, 264 

use in making inferences 
about population variance, 
284r-294 
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Random sampling, from a normal 
population, distributions of 
g statistics, 243 
use in testing departure from 
normality, 296-297 
distributions of sample betas, 
242, 243-245 

use in testing departure from 
normality, 296-297 _ 

joint confidence zone for X 
and d, 366-370 

joint distribution of means 
and standard deviations, 
derivation, 340-344 
formula, 344 

numerical illustrations, 343, 
346 

joint maximum-likelihood esti¬ 
mates of X and d, 370-371 

maximum-likelihood estimate of 

d 2 , 290-294 

maximum-likelihood estimate of 
X, d known, 273 
d unknown, 283-284 
testing departures from nor¬ 
mality, by use of 0 2 , and 
a, 296-297 

testing difference between 
means of two correlated sam¬ 
ples, 403-406 

testing difference between 
means of two independent 
samples, a common known 
d, 398-400 

a common unknown d, 400- 
402 

unequal d’s, 403 
testing the difference between 
variances of two inde¬ 
pendent samples, both di¬ 
rections considered, 4LI— 
413 

only one direction considered, 
406-411 

testing hypotheses about both 
X and d, 344-366 
equations for X contours, 353 


Random sampling, from a normal 
population, testing _ hy¬ 
potheses about both X and 
d, lambda (X) probability 
tables, 361, 362 
regions of rejection, 344-361 
using corner region, 364-366 
using X contours, 345—354 
testing hypotheses about d 2 , 
284-287, 289-290 
testing hypotheses about X, d 
known, 267-269 
d known compared with d 
unknown, 277-279 
d unknown, 273-276, 284 
testing whether two samples 
are from same population, 
413-416 

from a normal bivariate popula¬ 
tion, confidence limits for, 
correlation coefficient (r), 
301-302 

' regression parameters, two 
variables, 377 
more than two variables, 
380 

confidence limits for Xi, 387- 
390 

. distribution of regression sta¬ 
tistics, 375-376 

distribution of sample correla¬ 
tion coefficients, 298-300 
distribution of sample lines of 
regression, 380—381 
distribution of sample z’s, 299- 
300 

limiting loci for population 
line of regression, 382-384 
maximum-likelihood estimate of 
correlation coefficient (r), 
302-303 

regression statistics, 372-374 
testing hypotheses about, cor¬ 
relation coefficient (r) , 300- 
301 

population regression para¬ 
meters, 376-377 
X[, 381-382 


496 


SAMPLING STATISTICS AND APPLICATIONS 


Random sampling, from abnormal 
bivariate population, test- 
ing difference between cor¬ 
relation coefficients of two 
independent samples, 417 
regression statistics of two 
independent samples, 417- 
419 

testing for linearity, 306-307 
testing significance of rj , 306 
testing significance of /, 306 
from a normal multivariate popu¬ 
lation, confidence limits 
for, higher order variances, 
386-387 

multiple correlation coeffi¬ 
cient (Rf lf * . . . ), 305 
population regression para¬ 
meters, 380 

distribution of higher order 
variances, 385-386 
distributions of regression sta¬ 
tistics, 377-378 
distribution of X[, 385 
inferences about partial cor¬ 
relation coefficients, 303-304 
limiting loci for the population 
plane of regression, 385 
maximum-likelihood estimates 
of, higher order variances, 
387 

multiple correlation coeffi¬ 
cient, R itjk , , , , 305-306 
regression statistics, 374-375 
testing hypotheses about, higher 
order variances, 386 
multiple correlation coeffi¬ 
cient Ri, J& . . . , 304-305 
a regression parameter, 378- 
380 

X f h 385 

meaning, 154, 155-156 
and probability, 154 
of public opinion, 330-331 
representative, 183-184 
statistical inferences from (see 
Statistical inference) 
stratified, 183-184 


Random sampling, technique of, 
mechanical randomizing de¬ 
vices, 157-158 
natural selection, 161-162 
ordinal selection, 156-157 
random sampling numbers, 159- 
161 

tables of numbers, 158-159 
Random Sampling Numbers , 159 
Range, absolute, definition, 13 
relative, definition, 13 
sampling distribution, 242, 482 
probabilities, table of, 482 
use in quality control, 295-296 
Rank correlation, 452-453 
Reciprocals of numbers, table, 467- 
468 

Region of acceptance, 164 
Region of rejection (see Statistical 
inference, testing hypotheses, 
region of rejection) 

Rietz, Henry L., 92 
Robinson, G., 93 

S 

Sampling, purposive, 184-185 
random (see Random sampling) 
reasons for, 153-154 
Sampling distribution, of a = 
A.D./o-, 244-245 
table of .01, .05, and .10 

points, 481 
of V£i, 242-244 
table of .05 and .10 points, 480 
of 0 2 , 242-245 

table of .01 and .05 points, 480 
of the correlation coefficient (r), 
298-300 

of the difference between two 
means, 399, 401 
percentages, 393-394 
/a, 417 

explanation of, 164 
of g statistics, 242-243 
of a higher order variance, 385- 
386 

of k statistics, 242-243 
of L, 412 
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Sampling distribution, of \h, 415 
probabilities, table of, 415 
of a linear function, 108-109 
of the mean, any population, 
107-108 

normal population, 231, 262-263 
special nonnormal populations, 
445-446 

of the median, 241-242 
of the multiple correlation coeffi¬ 
cient, 304-305 

of VN (X - X)/*, nonnormal 
populations, 449 
normal population, 241, 264- 
266 

special nonnormal populations, 
447-448 

one general distribution, 114n. 
of partial correlation coefficient, 

303 

of a percentage, 190-194, 210, 
211-212 

of the ratio of two maximum- 
likelihood estimates of vari¬ 
ance, 409 

of a regression coefficient, 108-109 
of the standard deviation, normal 
population, 236, 264 
special nonnormal populations, 
446-447 

standard error of, 164 
of 2 = tanbr 1 r, 299-300 
use of various distributions in 
making inferences (see Anal¬ 
ysis of variance; v Random 
sampling; Statistical infer¬ 
ence) 

of the variance, nonnormal popu¬ 
lations, 450 

normal population, 234-236, 
263-264 

special nonnornial populations, 
446-447 

Semi-invariants (see Cumulants) 
Sheppard, W. F., 132 
Sheppard’s corrections (see Fre¬ 
quency curves, fitting of, Shep¬ 
pard’s corrections) 


Skewness, 11 

Smith, B. Babington, 159, 160 

Squares of numbers, 100-1000 table, 
461-462 

Square roots of numbers, 100-10C0 
table, 689-690 

Standard deviation, definition, 12 
first order, 18-19 
higher order, 18 

sampling distribution, 236, 264, 
446-447 

Standard error, of Vffi, 242, 244 
of jS 2 , 242, 244 
definition, 164 

of the difference between two 
sample means, 399 
percentages, 394 
regression statistics, 418 
z% 417 
of 0 i, 243 
of 02 , 243 

of the mean, 231, 268 
of the median, 173, 242 
of a percentage, 190, 195, 210-211, 
215, 315, 319, 
of the range, 242, 482 
of regression statistics, more than 
two variables, 378 
two variables, 375 
of a sum or difference, correlated 
variables, formula, 406 
independent variables, formed a, 
392 

proof of formulas, 420-421 
of the variance, 289 
of X[, 381, 382, 385 
or 2 = tank' 1 r, 301 
Statistic, 5, 181 

Statistical inference, confidence 
intervals, confidence coeffi¬ 
cient, 175, 176 
confidence limits, 176 
definition, 174-175 
determination of, 175-176 
arbitrary elements in, 177- 
178 

effect of size of sample, 179 
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Statistical inference, estimation of 
population parameters, 174-185 
maximum-likelihood estimates of 
population parameters, con¬ 
sistency of, 182 
efficiency of, 182 
method, 179-182 
as “optimum” statistics, 182- 
183 

as “unbiased estimates,” 424 
sufficiency of, 183 
symbolic representation, 10 
sampling distributions, explana- 
t tion, 164 

standard error, 164 
testing hypotheses, arbitrary ele¬ 
ments in, 165-168 
caution in, 339 
coefficient of risk, 164 
effect of size of sample, 171-172 
effect of statistic selected, 
172-174 

error I, definition, 163-164 
limiting risk of, 164-168 
error II, definition, 164 
minimizing risk of, 168-171 
power of a test, 198, 199 
the problem, 163 
region of acceptance, 164 
region of rejection, definition, 
164 

selection of, 166-171 
unbiased region, 201 
theory of, 163-185 

(See also Random sampling) 
Stirling’s approximation for fac¬ 
torials, 70-71 

Symbols, table of, used in this book, 
xi 

Symmetrical binomial distribution, 
betas, 34 


Symmetrical binomial distribution, 
characteristics, 34 
conditions leading to, 37-38 
derivation of, 32-33 
formula, 33 
mean, 34 

numerical example, 33 
relative slope of, 74-75 
significance of, 35-36 
standard deviation, 34 

T 

Tables for Statisticians and Bio¬ 
metricians , 148, 149 
Tch6bychef’s inequality, 453-455 
extension of, 454-455 
t distribution (see Frequency curves, 
t distribution) 

Test of goodness of fit (see Fre¬ 
quency curves, testing good¬ 
ness of fit) 

Testing hypotheses (see Statistical 
inference, testing hypotheses) 
Test of independence (see Inde¬ 
pendence, test of) 

Tippett, L. H. C., 159 
Tracts for Computers , 117, 159 

U 

United States census (1940), 162 
Universe (see Population) 

W 

Whittaker, E. T., 93 
Williams, P., 244 

Z 


2 transformation, 299-300 



